Auquan Blog

Comparing Our Model's Predictions Against What Actually Happened

May 1, 2020

In our previous posts, we’ve covered the problem with modelling COVID19, and our approach to modelling the disease, which is a mix of compartmental models, statistics and machine learning. This post analyses the model’s predictions from the last 6 weeks to see what differences there are between it’s expected outcomes and reality.

The biggest benefits of our modelling approach were the ability to account for asymptomatic infections and model their properties separate from the reported infections. This means that our model shouldn’t be as susceptible to changes in reported cases caused by new testing regimes. In addition, our model estimates suitable parameters from reported confirmed cases and deaths every day - this means our model isn’t static, it updates and improves as new data becomes available.

To see this, let’s look at the performance of our model for reported confirmed cases, deaths and infection peaks. We’re going to look at countries in two groups: Past their peak and close to peaking.

Past The Peak

In this section we’re going to look at the following countries:

  • Germany
  • Spain
  • Italy
  • New York

Germany has very effectively managed the coronavirus outbreak and has been past its peak for three weeks now. The graph below shows the reported confirmed cases against our weekly model predictions.

The first two graphs in this figure look at active cases and confirmed cases of Covid-19 respectively. The black lines represent actual values reported by the states and the coloured lines are forecasts made at different points in time. The key lists the forecasts in chronological order such that blue is the oldest (here run on the 2nd April) and purple is the most recent (run on the 30th April). As we look at the forecasts in this chronological order, we can see a clear pattern: The peak of infections becomes closer and shallower as the model ingests more information about the country’s social distancing measures. After the first couple of weeks, we can see the model’s prediction become consistent and accurate to the actual peak.

We can run the same analysis and this time look at how the model predicts deaths. We can see a similar pattern, where the initial forecasts overestimate deaths (as they are based on pre-lockdown data) before they then stabilise around the current trajectory.

The descriptive statistics from the model show that coronaviruses fatality rate for reported cases (cCFR) should be about 5% and the true mortality rate is about 0.2%. This would imply in Germany only 1/25 cases of the disease are tested and recorded as an official case.

  • Actual Peak: April 6, 2020
  • Predicted cCFR: Deaths / Reported Cases = 5.6%
  • Predicted IFR: Deaths / Total Cases 0.2%

In the results for Spain we see a similar pattern to Germany across all three graphs. The first two weeks, when there was not much social distancing data, show over estimates that rapidly flatten and peaks come earlier as social distancing takes effect. We can see that the predictions for confirmed cases and deaths are particularly stable. The active cases graph is mostly similar, but also shows Spain’s slight double peak.

For the descriptive statistics, you’ll notice that the predicted IFR (which is the total fatality rate, including from very mild and asymptomatic cases that are not reported) is similar to German. The cCFR however is significantly higher. Some of this difference is caused by differences in testing, but it also is caused by population differences and healthcare provisioning. We will see below that Italy’s cCFR is even higher.

  • Actual Peak: April 25, 2020
  • Predicted cCFR: Deaths / Reported Cases = 12%
  • Predicted IFR: Deaths / Total Cases = 0.4%

Italy shows the same basic pattern of overestimation for the first two weeks, before forecasts become more stable. In the active cases forecasts we can see some inaccuracies in the predictions even at the prediction start points (where it should equal the reported data), this is caused by discrepancies in reporting of recoveries. If we then look at the confirmed cases and fatality rates, we will see that, for most forecasts, the model is over estimating the number of cases and simultaneously underestimating the fatalities. We would expect the model to account for these differences quicker, but it shows that the seriousness of cases in Italy has been higher than in other countries.

For the descriptive statistics, notice how both the cCFR and IFR are multiples of Germany’s values for each. This suggests that outcomes are significantly worse for people who get covid-19 in Italy than Germany. This might be a factor of different population compositions, healthcare availability, some unknown medical factor or an environmental condition.

  • Actual Peak: April 19, 2020
  • Predicted cCFR: Deaths / Reported Cases = 14.5%
  • Predicted IFR: Deaths / Total Cases = 0.6%
New York

It should now be an obvious theme of early forecasts overestimating, so we won’t mention it going forward. An interesting point to note about New York data is the complete lack of reliable recovery data. This means that whenever the model starts predicting active cases it expects a decline from the starting point as the older active cases start to be announced as recoveries.

Besides this issue with active cases, we can see that the model predictions for confirmed cases and deaths has accurately predicted the reported values.

What’s interesting to note with New York is that despite the reported cases being 167,000. Our model would give a rough prediction of the true value to be 3.6 million (based on the IFR below and a current death rate of 12,976). This aligns closely with recent random sampling studies show that 21% of New Yorkers may have been infected - 1.68m in NYC alone).

  • Actual Peak: April 29, 2020
  • Predicted cCFR: Deaths / Reported Cases = 6.4%
  • Predicted IFR: Deaths / Total Cases = 0.36%

Near or Yet to Peak

In this section we’re going to look at the following countries:

  • UK
  • USA

On first view it seems like the UK follows a similar pattern, but on close inspection we can see that the initial 4 projections seem to move down and to the left. The final one however moves up, keeping the same peak time (this week) but with more active cases than was originally expected. The confirmed cases have consistently been lower than projections and this is likely to do with the UKs slow scaling of testing. Finally, deaths have followed a low trajectory so far, showing that recorded outcomes have been better than expected, but this may change following the changes in death reporting by the UK.

  • Expected Peak: April 29, 2020`
  • Predicted cCFR: Deaths / Reported Cases = 13%`
  • Predicted IFR: Deaths / Total Cases = 0.2%`

Unlike New York, the rest of the United States is at varying points behind the peak, causing the country as a whole to still have an upward trajectory. Despite difficulties in the lockdown process, we can see the enormous life saving effect it is having. The model predicts that the US should reach peak cases in the next week.

  • Expected Peak: May 3, 2020
  • Predicted cCFR: Deaths / Reported Cases = 8%
  • Predicted IFR: Deaths / Total Cases = 0.7%

Share this Article: