How We're Predicting The Coronavirus Pandemic - Our SEIR Model
April 23, 2020 by Vishal
This article will take an in depth look at the model we use to predict coronavirus infections and death rates. We will explain the steps we’ve taken to avoid the pitfalls of a standard virus model for the specific case of COVID-19. If you’re new to the topic of disease modelling, you can check out our primer article here.
The problems with modelling COVID-19
The starting point for our model is a compartmental SEIR model.
One major requirement for fitting a standard compartment model is that we accurately know the number of people in each compartment at every point in time till now. When it comes to COVID-19 though, with limited and inconsistent testing, this assumption doesn’t hold. Whilst some countries, notably South Korea and Iceland, quickly tested large portions of the population, many countries only test people showing mild to serious symptoms. This gives us our first problem: the reported number of infections by a country isn't representative of the total number of infected people.
Another important assumption in compartment models is that people in the same compartment have similar properties. So, people in the infectious compartment should be spreading infections at the same rate on average. This assumption is also not true in case of COVID-19. In most countries, when people are being reported as infected (a confirmed test), they are being placed into quarantine. However, unreported infections (who often show mild symptoms or are asymptomatic) are not under such strict distancing, so the spread from these unreported infections is different. This gives us our second problem: The infected population has heterogeneous characteristics.
Overview of Our Approach
- We start with a standard SEIR model and break down the Infectious compartment(I) into reported infections (Ir) and unreported infections (Iu).
- People from the Exposed compartment(E) with symptoms go into the reported infections (Ir) compartment. People with mild or no symptoms go into the unreported infections compartment.
- Both reported infections (Ir) and unreported infections (Iu) contribute to disease spread, but at different rates - a low spread for people quarantined in compartment Ir, and a high spread for, often, asymptomatic people in compartment Iu. This handles the quarantine measures taken by countries to limit the infections.
- Some people in reported infections (Iu) recover (compartment C) and some die (compartment D)
- All people in unreported infections (Iu) recover (compartment Cu), there are no mortalities in this compartment. If thier symptoms were serious enough they would be reported (and therefore in compartment Ir). The threshold for this would be different in different countries, but we have assumed that it is similar. A flow through the compartments in our model is shown below.
Also the differential equations describing the flow are given below
A few things to note:
- 𝛽1 is the rate of spread from Ir and 𝛽2 is the rate of spread from Iu compartment, the reason for this is discussed above
- 𝜖 % of Exposed people show symptoms and are reported as Infectious (Iu). The remaining, (1- 𝜖) infectious but unreported (Ir)
- People move from Exposed to infectious buckets at a rate 𝛼 = 1/incubation period
- A certain percentage of Reported Infectious(Ir) people Die (D) at a rate 𝛿 = 1/time to death. The remaining recover at rate 𝜂 = 1/time to recovery reported. The percentage of Reported Infectious (Ir) people who Die (D) is given by cCFR, confirmed Case Fatality Rate.
- All unreported Infectious (Iu) people Recover (Cu) at a rate 𝜁 = 1/time to recovery unreported
We then try to fit our model to reported confirmed cases and reported deaths, this allows us to estimate 𝛽1 and 𝛽2. For other parameters we use estimates from other studies and try to fit the data within a range of those estimates. We discuss the details of the model fitting process below.
The Specifics Of Our Approach
As described above, we need to know two sets of values to be able to predict infections - the initial populations for each of the compartments (S, E, Ir, Iu and R), plus the parameters that describe the differential equations. Some of the initial values can be gathered from reported data, but others we will have to estimate. To do this and to generate the differential equation parameters we run an optimisation, fitting our generated curve to actual numbers for confirmed cases, recoveries and deaths for every country.
We estimate initial reported deaths 𝐷(0) as deaths reported on the first day of simulation
We estimate initial reported recoveries C(0) as recoveries reported on the first day of simulation
We estimate initial reported infections 𝐼(r0) as (Confirmed Cases - recoveries - deaths) reported on the first day of simulation
Creating Estimates for Unknown Variables
For the following parameters, we start with initial estimates but let the model find a fit within tight bounds based on real world data. We bound the model’s optimisation as, whilst we acknowledge a significant external component, we expect the virus’s underlying etiology to be similar and therefore should have similar mortality and morbidity rates on a population level. We then let the model identify the exact value within these bounds. Below we describe the process for generating estimates for each of the variables:
Incubation Period, Time to Death, Time to Recovery We start with estimates of incubation period, time to death, time to recovery and set bounds around these for the model to find reasonable estimates. These bounds are taken based on published studies and empirical research.
Initial Value for the population of E (E(0)) We try to fit for initial exposed people with the initial guess as 100*Confirmed Cases today
Total Fatality Rate/Infection Fatality Rate (IFR) IFR is defined as deaths/total infected population. To get an initial value for IFR, we use data from S. Korea since they have done extensive testing. Another country you could look at here would be Iceland.
Current Total Infections (I) The current total infection, I(0) = reported infections (I(r0)) + unreported infections (I(u0)). Using the IFR calculated above and total reported deaths on 15th day of simulation 𝐷(15), we can estimate the total infected population, I(0) as 𝐷(15)/IFR. When incorporating reported deaths, it is important to remember to include a time lag from symptom onset to death (15 days),
Unreported Infections (Iu) We then estimate initial unreported infections 𝐼(u0), as 𝐼(0) - 𝐼(r0) (as defined above).
Unreported Recoveries Similarly, using 𝐼𝐹𝑅 and reported deaths 𝐷(0) we estimate the total infected population, 𝐼(−15) 15 days prior to first day of simulation. We estimate initial unreported recoveries 𝐶𝑢0 as: (𝐼(-15)) - (𝐼(r0)) - (𝐷(0)) - (𝐶(0)) (Presuming that everybody who was infected 15 days back and hasn’t been reported yet must have recovered.)
Case Fatality Rate cCFR We use data from China for cCFR. cCFR can be defined as deaths/total confirmed cases or deaths/closed cases, where closed cases = (recovered cases + deaths)
Since every case will eventually either recover or die, these two should eventually converge to the same number. We take the average of that asymptotic value as cCFR and use that as our initial estimate for cCFR, in this case as well we let the model find the exact value
Optimising These Values
Once we have these initial estimates to describe the disease and it’s spread, we next let the model optimise them according to the real world data. In order to do this we fit the various aspects as follows:
We fit reported confirmed infections(reported active infections + reported deaths + reported recoveries) from our model to true reported confirmed infections
We fit deaths from our model to reported deaths
The final residual function, which the model tries to minimise to achieve the best fit is a sum of residuals from Infections and Deaths.
Note: During this process we found the reported recoveries data was inconsistent and, therefore, ignore it during model fitting.
Hopefully this gives you some insight into how our prediction tool works. If you want to take a look and see how changing some of these variables will affect the Covid pandemic you can. We’ve hosted our tool at https://covid19-infection-model.auquan.com/. Go take a look now, and we’ll also write a future article explaining how to use the model.
Share this Article:
Auquan is a data science solutions provider for asset managers and hedge funds. Our state of the art technology empowers Portfolio Managers to stay ahead of the trend and achieve better returns.