Methods
Raw case numbers
- World country dataset from: John Hopkins University Center for System Science and Engineering John Hopkins University dataset, which is updated daily in DATA1.
The name of the latest time series (since 22/3):
- time_series_covid19_confirmed_global.csv for cumulative confirmed cases.
- time_series_covid19_deaths_global.csv for cumulative deaths.
-
Spanish region dataset. Confirmed and Deaths cases by Autonomous Community of Spain (or province) available at Situation of COVID-19 in Spain from Instituto de Salud Carlos III. See section ‘Documentación y datos’.
-
Italian region dataset. Confirmed and Deaths cases by regions of Italy available at COVID-19 Italia - Monitoraggio situazioneDipartimento della Protezione Civile from Presidenza del Consiglio dei Ministri - Dipartimento della Protezione Civile. Data updated daily in DATA3.
Variables definition:
- Cumulative cases at day \(t\): \(x_t^{(j)}\) with \(j\in \{1,...,5\}\) being, respectively for, confirmed and deaths cases.
- New cases at day \(t\): \(x_t^{(j)} - x_{t-1}^{(j)}\)
- Growth Rate of cases - H$_k$: \(r_{k}^{(j)}(t)=\frac{x_{t+k}^{(j)} - x_{t}^{(j)}}{x_{t}^{(j)} + 1}\) for \(t=...,t_0-1\) and \(k=1,\ldots,5\)
Methodology
Related with the idea of “flattening the curve”, we consider the curve (\(r_{1}^{(j)}(t)\)) that captures how growth rate changes over time. Besides, we smooth this signal to avoid the effect of sudden changes in notification (such as the weekend effect).
Objective: Predict the growth rate at horizon \(k\) using the past during the last 15 days of growth rate H$_1$:
$$R_{1}(0)=\{r_1^{(j)}(-14),\ldots,r_1^{(j)}(0)\}$$
Algorithm steps:
- Filtering:
- Some data from certain regions are banned by certain inconsistency on the records.
- For \(r_{t+k}^{(1)}\) response (confirmed cases), we uses the countries or regions with more than 200 confirmed cases at time \(t\).
- For \(r_{t+k}^{(2)}\) response (deaths cases), we uses the countries or regions with more than 30 deaths at time \(t\).
- Fit the model. Three functional models of the general regression are constructed:
\(r_{k}^{(j)}(0) = f(R_{1}(0)) + \epsilon\), where the difference lies in the form of the \(f\):
- FLM, uses a linear function: \(f(R_{1}(0))= \int{R_{1}(t)\beta(t)dt}\).
- FNP: uses a \(f\) is a nonparametric kernel estimate.
- SAM: uses a \(f\) is an additive combination of smooth functions of the main functional principal components.
- Predictions:
- Re-estimate Functional Models (Step 2) when new data is available (all countries and regions of Data1 and Data2).
- Reconstruct the expected number of accumulated cases and deduce the new cases to each horizon (confirmed and deaths).
Fundings
This work has been supported by Project MTM2016-76969-P from Ministerio de Economía y Competitividad - Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy.
Acknowledgements
Thanks to Diego Campanario for creating the Shiny server.
Data Incidences in regions of Spain (CCAA)
The file obtained from Instituto de Salud Carlos III (ISCIII) has suffer changes along time in the units of the variables. Typically, the historical data is not reconstructed.