Code for "Estimating internationally imported cases during the early COVID-19 pandemic" by Menkir et al. 2020
The project requries the following R packages:
c("Bolstad2", "broom", "caret", "cartogram",
"data.table", "dplyr", "dvmisc", "formattable",
"gam", "ggmap", "ggplot2", "ggpubr", "ggthemes", "grid", "gridExtra",
"janitor", "kableExtra", "knitr", "leaflet", "lubridate", "maptools",
"pals", "patchwork", "purrr", "RColorBrewer",
"readr", "reshape2", "rgdal", "rio", "scales", "spData", "stats",
"stringi", "stringr", "tidyr", "tidyverse", "tmap", "zoo")
External data are required from a number of sources:
-
MIDAS cumulative case data: province and city-level cumulative case data from China. These should be downloaded from the MIDAS git repo.
-
MIDAS province-level daily case counts in China. Also downloaded from the MIDAS git repo.
-
ECDC global confirmed cases from here.
-
Code and MCMC outputs from Tsang et al. 2020, DOI:https://doi.org/10.1016/S2468-2667(20)30089-X. Note that you will need to clone the git repo available here.
This repo is split into folders for data
, code
, scripts
, figures
and out
.
data
: provides pre-computed and generated data for the analyses.code
: stores auxiliary R functions.scripts
: the main folder containing all scripts used to generate figures and analyses. Note that this is split intopre_computation
scripts, which can be run should the user wish to reproduce data files from scratch. Otherwise, the scripts labeled 1-6 are sufficient to reproduce the analyses.figures
: folder to store generated figures and panels.out
: folder to store larger, intermediate data files.
- Source the file
scripts/headers.R
before running any other scripts. NOTE you should change themain_wd
object to the full file path to where this repo is stored. This script contains all libraries and global settings for subsequent scripts.
EXAMPLE: the script EXAMPLE_flight_volume_estimation.R
provides the code we used to generate flight volume data. However, the original source files are proprietary and cannot be shared, though all of the necessary data are available in data/flights_all_cities2.csv
. This code is included for transparency only and cannot be run.
1.ECDC_case_data.R
: produces the files data/hasdetected.Rdata
and data/df_country_cont_ecdc.Rdata
.
2.extract_midas_data.R
: extract MIDAS case count data and produce clean data file.
3.apportion_tsang_incidence.R
: apportion incidence as per Tsang et al. analysis for Scenario 2.
4.apportion_province_cases_to_cities.R
: apportion incidence to Chinese cities by different methods (population, % of cases).
5.calculate_ascertainment_rate_ratio.R
: calculate ascertainment rate ratio between Wuhan and non-Wuhan cities.
1.generate_prevalence_estimates.R
: generate time-varying daily prevalence for each province and city for each of the scenarios, incorporating the assumptions specified in data/scenario_key.xlsx
.
2.create_master_table.R
: creates large tibble to compute all numbers and create all figures in the text.
3.all_numbers.R
: computes all numbers shown in the manuscript. Note that the main scenario object needs to be changed to investigate particular scenarios in more detail.
4.import_figures.R
: creates all panels for Figures 2, 3, S4 and S5.
5.prevalence_figures.R
: produces panels for Figures 1, S2 and S3.
6.flight_volume_figure.R
: produces panels for Figure S1.
- who_imports.csv - contains reported COVID-19 case imports from Wuhan pre-lockdown /n
- midas - folder containing data files needed to extract case counts from MIDAS git repo
- midas/midas_data_final.csv - contains confirmed COVID-19 case counts for Chinese provinces
- digitize_verity.csv - contains the ascertainment rates per age-category digitized from Verity et al. (Lancet Infect Dis 2020)
- cn_iata_code_shortlist.csv - contains the names, IATA codes, and city names of Chinese origin airports
- provinces_popn_size_statista.csv - contains the population sizes of Chinese provinces
- df_city_pop.Rdata - contains the populations sizes of Chinese cities
- df_country_cont_ecdc.Rdata - contains a key for destination countries and continents
- frac_popn_city.Rdata - contains the population fractions of chinese cities relative to their respective provinces
- tsang_predictions_apportioned.csv - contains the COVID-19 incidence estimates from Tsang et al. (Lancet Public Health 2020)
- flights_all_cities2.csv - contains flight volume between Chinese origin cities and international destinations by calendar day
- 2019_flight_line.csv - contains total flight volume between Chinese origin cities and international destinations by calendar day in 2019
- hasdetected.Rdata - contains an indicator for when an African location has detected a COVID-19 case
- master_table_2910.csv - << MUST BE GENERATED>> contains the prevalence indicators under all scenarios and flight volume estimates in a large dataframe that will be used for all number calculations and figures of the manuscript
- tsang2020 - folder containing files to extract incidence predictions for Scenario 2 as per Tsang et al.
- scenario_key.xlsx - key giving descriptions and names to each of the 9 scenarios.