diff --git a/docs/influenza_1718.md b/docs/influenza_1718.md index 1a25980..1d1d299 100644 --- a/docs/influenza_1718.md +++ b/docs/influenza_1718.md @@ -15,20 +15,20 @@ located in `~/tutorials/influenza_1718/`. ## Data -Data on the weekly incidence of visits to General Practictioners (GP) are made publically available by the Belgian Scientific Institute of Public Health (Sciensano). These data were retrieved from the "End of season" report on Influenza in Belgium (see `data/raw/Influenza 2017-2018 End of Season_NL.pdf`). Using [Webplotdigitizer](https://automeris.io/WebPlotDigitizer/), the weekly number of GP visits in the different age groups were extracted (see `data/raw/dataset_influenza_1718.csv`). Then, the script `data_conversion.py` was used to convert the *raw* weekly incidence of Influenza cases in Belgium (per 100K inhabitats) during the 2017-2018 Influenza season into a better suited format. The weekly incidence was first converted to the absolute GP visits count by multipying with the demographics. Then, it was assumed that the weekly incidence was the sum of seven equal counts throughout the week preceding the data collection, and hence the weekly data were divided by seven. The formatted data are located in `data/interim/data_influenza_1718_format.csv`. +Data on the weekly incidence of visits to General Practictioners (GP) for Influenza-like illness (ILI) are made publically available by the Belgian Scientific Institute of Public Health (Sciensano). These data were retrieved from the "End of season" report on Influenza in Belgium (see `data/raw/Influenza 2017-2018 End of Season_NL.pdf`). Using [Webplotdigitizer](https://automeris.io/WebPlotDigitizer/), the weekly number of GP visits in the different age groups were extracted (see `data/raw/ILI_weekly_1718.csv`). Then, the script `data_conversion.py` was used to convert the *raw* weekly incidence of Influenza cases in Belgium (per 100K inhabitats) during the 2017-2018 Influenza season into a better suited format. The week numbers in the raw dataset were replaced with the date of that week's Thursday, as an approximation of the midpoint of the week. Further, the number of GP visits per 100K inhabitants was converted to the absolute number of GP visits. The formatted data are located in `data/interim/ILI_weekly_100K.csv` and `data/interim/ILI_weekly_100K.csv`. ![data](/_static/figs/influenza_1718/data.png) -The data are loaded in our calibration script `~/tutorials/influenza_1718/calibration.py` as a `pd.DataFrame` with a `pd.Multiindex`. The `time`/`date` axis is obligatory. The other index names and values are the same as the model's dimensions and coordinates. In this way, pySODM recognizes how model prediction and dataset must be aligned. +The absolute weekly number of GP visits data are loaded in our calibration script `~/tutorials/influenza_1718/calibration.py` as a `pd.DataFrame` with a `pd.Multiindex`. The weekly number of GP visits is divided by seven to approximate the daily incidence at the week's midpoint (which we'll use to calibrate our model). The `time`/`date` axis in the `pd.DataFrame` is obligatory. The other index names and values are the same as the model's dimensions and coordinates. In this way, pySODM recognizes how model prediction and dataset must be aligned. ```bash date age_group -2017-11-27 (0, 5] 15.373303 - (5, 15] 13.462340 - (15, 65] 409.713333 - (65, 120] 33.502705 +2017-12-01 (0, 5] 15.727305 + (5, 15] 13.240385 + (15, 65] 407.778693 + (65, 120] 32.379271 ... -2018-05-07 (0, 5] 0.000000 +2018-05-11 (0, 5] 0.000000 (5, 15] 0.000000 (15, 65] 0.000000 (65, 120] 0.000000 diff --git a/tutorials/influenza_1718/README.md b/tutorials/influenza_1718/README.md index d02ada2..89b20f3 100644 --- a/tutorials/influenza_1718/README.md +++ b/tutorials/influenza_1718/README.md @@ -30,6 +30,6 @@ First, by running the script `data_conversion.py`, the user converts the *raw* w ### Interim -+ `data_influenza_1718_format.csv`: Daily incidence of Influenza cases in Belgium during the 2017-2018 Influenza season. Data available for four age groups: [0,5(, [5,15(, [15,65(, [65,120(. Generated from `dataset_influenza_1718.csv` by executing the data conversion script `data_conversion.py`. - ++ `ILI_weekly_100K.csv`: Weekly incidence of GP visits for Influenza-like illness per 100K inhabitants in Belgium during the 2017-2018 season. Data available for four age groups: [0,5(, [5,15(, [15,65(, [65,120(. Dates are the reported week number's midpoint. Generated from `dataset_influenza_1718.csv` by executing the data conversion script `data_conversion.py`. ++ `ILI_weekly_ABS.csv`: Weekly incidence of GP visits for Influenza-like illness in Belgium during the 2017-2018 season. Data available for four age groups: [0,5(, [5,15(, [15,65(, [65,120(. Dates are the reported week number's midpoint. Generated from `dataset_influenza_1718.csv` by executing the data conversion script `data_conversion.py`. diff --git a/tutorials/influenza_1718/calibration.py b/tutorials/influenza_1718/calibration.py index 7e32e51..9d364c4 100644 --- a/tutorials/influenza_1718/calibration.py +++ b/tutorials/influenza_1718/calibration.py @@ -32,41 +32,41 @@ tau = 0.50 # Timestep of Tau-Leaping algorithm alpha = 0.03 # Overdispersion factor (based on COVID-19) -end_calibration = pd.Timestamp('2018-03-01') # Enddate of calibration +end_calibration = '2018-03-01' # Enddate of calibration identifier = 'twallema_2018-03-01' # Give any output of this script an ID -n_pso = 3 # Number of PSO iterations -multiplier_pso = 36 # PSO swarm size -n_mcmc = 50 # Number of MCMC iterations -multiplier_mcmc = 36 # Total number of Markov chains = number of parameters * multiplier_mcmc +n_pso = 30 # Number of PSO iterations +multiplier_pso = 10 # PSO swarm size +n_mcmc = 500 # Number of MCMC iterations +multiplier_mcmc = 10 # Total number of Markov chains = number of parameters * multiplier_mcmc print_n = 100 # Print diagnostics every print_n iterations -discard = 1000 # Discard first `discard` iterations as burn-in +discard = 50 # Discard first `discard` iterations as burn-in thin = 10 # Thinning factor emcee chains n = 100 # Repeated simulations used in visualisations -processes = int(os.getenv('SLURM_CPUS_ON_NODE', mp.cpu_count()/2)) # Automatically use half the number of available threads (typically corresponds to number of physical CPU cores) +processes = int(os.getenv('SLURM_CPUS_ON_NODE', mp.cpu_count())) # Retrieve CPU count ############### ## Load data ## ############### # Load case data -data = pd.read_csv(os.path.join(os.path.dirname(__file__),'data/interim/data_influenza_1718_format.csv'), index_col=[0,1], parse_dates=True, date_format='%Y-%m-%d') +data = pd.read_csv(os.path.join(os.path.dirname(__file__),'data/interim/ILI_weekly_ABS.csv'), index_col=[0,1], parse_dates=True, date_format='%Y-%m-%d') data = data.squeeze() # Load case data per 100K -data_100K = pd.read_csv(os.path.join(os.path.dirname(__file__),'data/interim/data_influenza_1718_format_100K.csv'), index_col=[0,1], parse_dates=True, date_format='%Y-%m-%d') +data_100K = pd.read_csv(os.path.join(os.path.dirname(__file__),'data/interim/ILI_weekly_100K.csv'), index_col=[0,1], parse_dates=True, date_format='%Y-%m-%d') data_100K = data_100K.squeeze() # Re-insert pd.IntervalIndex (pd.IntervalIndex is always loaded as a string..) age_groups = pd.IntervalIndex.from_tuples([(0,5),(5,15),(15,65),(65,120)], closed='left') iterables = [data.index.get_level_values('DATE').unique(), age_groups] names = ['date', 'age_group'] index = pd.MultiIndex.from_product(iterables, names=names) -df_influenza = pd.Series(index=index, name='CASES', data=data.values) -df_influenza_100K = pd.Series(index=index, name='CASES', data=data_100K.values) +df_influenza = pd.Series(index=index, name='CASES', data=data.values)/7 # convert weekly cumulative to daily week midpoint +df_influenza_100K = pd.Series(index=index, name='CASES', data=data_100K.values)/7 # convert weekly cumulative to daily week midpoint # Extract start and enddate -start_date = df_influenza.index.get_level_values('date').unique()[8] -end_date = df_influenza.index.get_level_values('date').unique()[-1] -start_calibration = start_date -# Hardcode Belgian demographics -initN = pd.Series(index=age_groups, data=np.array([606938, 1328733, 7352492, 2204478])) +start_visualisation = df_influenza.index.get_level_values('date').unique()[8].strftime("%Y-%m-%d") +end_visualisation = df_influenza.index.get_level_values('date').unique()[-1].strftime("%Y-%m-%d") +start_calibration = start_visualisation +# Hardcode Belgian demographics (Jan 1, 2018) +initN = pd.Series(index=age_groups, data=[620914, 1306826, 7317774, 2130556]) ################ ## Load model ## @@ -160,7 +160,7 @@ # Assign results to model model.parameters = assign_theta(model.parameters, pars, theta) # Simulate model - out = model.sim([start_calibration, end_date], samples={}, N=n, tau=tau, output_timestep=1) + out = model.sim([start_calibration, end_visualisation], samples={}, N=n, tau=tau, output_timestep=1) # Add poisson obervational noise out = add_negative_binomial_noise(out, alpha) # Visualize @@ -169,7 +169,7 @@ for id, age_class in enumerate(df_influenza.index.get_level_values('age_group').unique()): # Data axs[id].scatter(df_influenza[start_calibration:end_calibration].index.get_level_values('date').unique(), df_influenza.loc[slice(start_calibration,end_calibration), age_class], color='black', alpha=0.3, linestyle='None', facecolors='None', s=60, linewidth=2, label='observed') - axs[id].scatter(df_influenza[end_calibration:end_date].index.get_level_values('date').unique(), df_influenza.loc[slice(end_calibration,end_date), age_class], color='red', alpha=0.3, linestyle='None', facecolors='None', s=60, linewidth=2, label='unobserved') + axs[id].scatter(df_influenza[end_calibration:end_visualisation].index.get_level_values('date').unique(), df_influenza.loc[slice(end_calibration,end_visualisation), age_class], color='red', alpha=0.3, linestyle='None', facecolors='None', s=60, linewidth=2, label='unobserved') # Model trajectories for i in range(n): axs[id].plot(out['date'],out['Im_inc'].sel(age_group=age_class).isel(draws=i), color='blue', alpha=0.05, linewidth=1) @@ -195,18 +195,12 @@ # Perturbate previously obtained estimate ndim, nwalkers, pos = perturbate_theta(theta, pert=0.30*np.ones(len(theta)), multiplier=multiplier_mcmc, bounds=expanded_bounds) # Write some usefull settings to a pickle file (no pd.Timestamps or np.arrays allowed!) - settings={'start_calibration': start_calibration.strftime("%Y-%m-%d"), 'end_calibration': end_calibration.strftime("%Y-%m-%d"), + settings={'start_calibration': start_calibration, 'end_calibration': end_calibration, 'n_chains': nwalkers, 'starting_estimate': list(theta), 'labels': expanded_labels, 'tau': tau} # Sample n_mcmc iterations sampler = run_EnsembleSampler(pos, n_mcmc, identifier, objective_function, objective_function_kwargs={'simulation_kwargs': {'warmup': 0, 'tau':tau}}, fig_path=fig_path, samples_path=samples_path, print_n=print_n, backend=None, processes=processes, progress=True, - settings_dict=settings) - # Sample 5*n_mcmc more - for i in range(3): - backend = emcee.backends.HDFBackend(os.path.join(os.getcwd(),samples_path+identifier+'_BACKEND_'+run_date+'.hdf5')) - sampler = run_EnsembleSampler(pos, n_mcmc, identifier, objective_function, objective_function_kwargs={'simulation_kwargs': {'warmup': 0, 'tau':tau}}, - fig_path=fig_path, samples_path=samples_path, print_n=print_n, backend=backend, processes=processes, progress=True, - settings_dict=settings) + settings_dict=settings) # Generate a sample dictionary and save it as .json for long-term storage # Have a look at the script `emcee_sampler_to_dictionary.py`, which does the same thing as the function below but can be used while your MCMC is running. samples_dict = emcee_sampler_to_dictionary(samples_path, identifier, discard=discard, thin=thin) @@ -229,7 +223,7 @@ def draw_fcn(param_dict, samples_dict): param_dict['f_ud'] = np.array([slice[idx] for slice in samples_dict['f_ud']]) return param_dict # Simulate model - out = model.sim([start_date, end_date], N=n, tau=tau, output_timestep=1, samples=samples_dict, draw_function=draw_fcn, processes=processes) + out = model.sim([start_visualisation, end_visualisation], N=n, tau=tau, output_timestep=1, samples=samples_dict, draw_function=draw_fcn, processes=processes) # Add negative binomial observation noise out_noise = add_negative_binomial_noise(out, alpha) @@ -243,7 +237,7 @@ def draw_fcn(param_dict, samples_dict): for id, age_class in enumerate(df_influenza_100K.index.get_level_values('age_group').unique()): # Data axs[id].plot(df_influenza_100K[start_calibration:end_calibration].index.get_level_values('date').unique(), df_influenza_100K.loc[slice(start_calibration,end_calibration), age_class]*7, color='black', marker='o', label='Observed data') - axs[id].plot(df_influenza_100K[end_calibration-pd.Timedelta(days=7):end_date].index.get_level_values('date').unique(), df_influenza_100K.loc[slice(end_calibration-pd.Timedelta(days=7),end_date), age_class]*7, color='red', marker='o', label='Unobserved data') + axs[id].plot(df_influenza_100K[pd.Timestamp(end_calibration)-pd.Timedelta(days=7):end_visualisation].index.get_level_values('date').unique(), df_influenza_100K.loc[slice(pd.Timestamp(end_calibration)-pd.Timedelta(days=7),end_visualisation), age_class]*7, color='red', marker='o', label='Unobserved data') # Model trajectories axs[id].plot(out['date'],out['Im_inc'].sel(age_group=age_class).mean(dim='draws')/initN.loc[age_class]*100000*7, color='black', linestyle='--', alpha=0.7, linewidth=1, label='Model mean') axs[id].fill_between(out_noise['date'].values,out_noise['Im_inc'].sel(age_group=age_class).quantile(dim='draws', q=0.025)/initN.loc[age_class]*100000*7, out_noise['Im_inc'].sel(age_group=age_class).quantile(dim='draws', q=0.975)/initN.loc[age_class]*100000*7, color='black', alpha=0.15, label='Model 95% CI') diff --git a/tutorials/influenza_1718/data/interim/ILI_weekly_100K.csv b/tutorials/influenza_1718/data/interim/ILI_weekly_100K.csv new file mode 100644 index 0000000..27da109 --- /dev/null +++ b/tutorials/influenza_1718/data/interim/ILI_weekly_100K.csv @@ -0,0 +1,129 @@ +DATE,AGE,CASES_100K +2017-10-06,"[0, 5)",3.53073360514691 +2017-10-06,"[5, 15)",6.31047999020029 +2017-10-06,"[15, 65)",14.5889596615352 +2017-10-06,"[65, 120)",11.9210107267218 +2017-10-13,"[0, 5)",16.9737694017181 +2017-10-13,"[5, 15)",8.69528973038268 +2017-10-13,"[15, 65)",28.0775650709766 +2017-10-13,"[65, 120)",11.5929095140596 +2017-10-20,"[0, 5)",13.9357952104024 +2017-10-20,"[5, 15)",5.64212566811102 +2017-10-20,"[15, 65)",16.6851618535429 +2017-10-20,"[65, 120)",3.32475895497555 +2017-10-27,"[0, 5)",35.7180701621351 +2017-10-27,"[5, 15)",2.54339199296919 +2017-10-27,"[15, 65)",21.9104774626053 +2017-10-27,"[65, 120)",18.8986298493355 +2017-11-03,"[0, 5)",4.56246232077729 +2017-11-03,"[5, 15)",19.2275807263673 +2017-11-03,"[15, 65)",13.894810397062 +2017-11-03,"[65, 120)",3.22926973845097 +2017-11-10,"[0, 5)",17.8943881440409 +2017-11-10,"[5, 15)",16.5611955617146 +2017-11-10,"[15, 65)",15.2280029793883 +2017-11-10,"[65, 120)",7.22884748542992 +2017-11-17,"[0, 5)",35.225891714284 +2017-11-17,"[5, 15)",17.8943881440409 +2017-11-17,"[15, 65)",21.8939658910203 +2017-11-17,"[65, 120)",16.5611955617146 +2017-11-24,"[0, 5)",21.8939658910203 +2017-11-24,"[5, 15)",0.562884573797874 +2017-11-24,"[15, 65)",29.8931213849783 +2017-11-24,"[65, 120)",29.8931213849783 +2017-12-01,"[0, 5)",17.7304964539012 +2017-12-01,"[5, 15)",7.09219858156075 +2017-12-01,"[15, 65)",39.0070921985816 +2017-12-01,"[65, 120)",10.6382978723409 +2017-12-08,"[0, 5)",92.1985815602839 +2017-12-08,"[5, 15)",63.8297872340427 +2017-12-08,"[15, 65)",42.5531914893618 +2017-12-08,"[65, 120)",10.6382978723409 +2017-12-15,"[0, 5)",17.7304964539012 +2017-12-15,"[5, 15)",81.5602836879434 +2017-12-15,"[15, 65)",46.0992907801419 +2017-12-15,"[65, 120)",10.6382978723409 +2017-12-22,"[0, 5)",0.0 +2017-12-22,"[5, 15)",21.2765957446813 +2017-12-22,"[15, 65)",67.3758865248228 +2017-12-22,"[65, 120)",60.2836879432625 +2017-12-29,"[0, 5)",138.297872340426 +2017-12-29,"[5, 15)",3.5460992907806 +2017-12-29,"[15, 65)",78.0141843971633 +2017-12-29,"[65, 120)",21.2765957446813 +2018-01-05,"[0, 5)",297.872340425532 +2018-01-05,"[5, 15)",81.1914893617022 +2018-01-05,"[15, 65)",209.219858156029 +2018-01-05,"[65, 120)",85.10638297872359 +2018-01-12,"[0, 5)",329.787234042553 +2018-01-12,"[5, 15)",134.751773049646 +2018-01-12,"[15, 65)",177.304964539007 +2018-01-12,"[65, 120)",63.8297872340427 +2018-01-19,"[0, 5)",404.255319148936 +2018-01-19,"[5, 15)",336.879432624114 +2018-01-19,"[15, 65)",265.957446808511 +2018-01-19,"[65, 120)",88.6524822695037 +2018-01-26,"[0, 5)",475.177304964539 +2018-01-26,"[5, 15)",716.312056737589 +2018-01-26,"[15, 65)",326.241134751773 +2018-01-26,"[65, 120)",67.3758865248228 +2018-02-02,"[0, 5)",599.290780141844 +2018-02-02,"[5, 15)",656.028368794327 +2018-02-02,"[15, 65)",460.992907801419 +2018-02-02,"[65, 120)",134.751773049646 +2018-02-09,"[0, 5)",797.872340425532 +2018-02-09,"[5, 15)",1028.36879432624 +2018-02-09,"[15, 65)",581.560283687943 +2018-02-09,"[65, 120)",209.219858156028 +2018-02-16,"[0, 5)",1230.49645390071 +2018-02-16,"[5, 15)",535.460992907801 +2018-02-16,"[15, 65)",620.567375886525 +2018-02-16,"[65, 120)",223.404255319149 +2018-02-23,"[0, 5)",556.737588652482 +2018-02-23,"[5, 15)",570.921985815603 +2018-02-23,"[15, 65)",748.22695035461 +2018-02-23,"[65, 120)",297.872340425532 +2018-03-02,"[0, 5)",943.262411347518 +2018-03-02,"[5, 15)",819.148936170213 +2018-03-02,"[15, 65)",684.397163120568 +2018-03-02,"[65, 120)",283.687943262411 +2018-03-09,"[0, 5)",982.2695035461 +2018-03-09,"[5, 15)",1244.68085106383 +2018-03-09,"[15, 65)",769.503546099291 +2018-03-09,"[65, 120)",304.964539007092 +2018-03-16,"[0, 5)",921.985815602837 +2018-03-16,"[5, 15)",329.787234042553 +2018-03-16,"[15, 65)",372.340425531915 +2018-03-16,"[65, 120)",237.588652482269 +2018-03-23,"[0, 5)",613.475177304965 +2018-03-23,"[5, 15)",861.702127659575 +2018-03-23,"[15, 65)",223.404255319149 +2018-03-23,"[65, 120)",195.035460992908 +2018-03-30,"[0, 5)",507.092198581561 +2018-03-30,"[5, 15)",170.212765957447 +2018-03-30,"[15, 65)",78.0141843971633 +2018-03-30,"[65, 120)",106.382978723404 +2018-04-06,"[0, 5)",106.382978723404 +2018-04-06,"[5, 15)",70.921985815603 +2018-04-06,"[15, 65)",24.8226950354615 +2018-04-06,"[65, 120)",28.3687943262412 +2018-04-13,"[0, 5)",99.2907801418442 +2018-04-13,"[5, 15)",35.4609929078015 +2018-04-13,"[15, 65)",14.184397163121 +2018-04-13,"[65, 120)",10.6382978723409 +2018-04-20,"[0, 5)",0.0 +2018-04-20,"[5, 15)",0.0 +2018-04-20,"[15, 65)",10.6382978723409 +2018-04-20,"[65, 120)",0.0 +2018-04-27,"[0, 5)",0.0 +2018-04-27,"[5, 15)",0.0 +2018-04-27,"[15, 65)",3.5460992907806 +2018-04-27,"[65, 120)",0.0 +2018-05-04,"[0, 5)",0.0 +2018-05-04,"[5, 15)",0.0 +2018-05-04,"[15, 65)",3.5460992907806 +2018-05-04,"[65, 120)",3.5460992907806 +2018-05-11,"[0, 5)",0.0 +2018-05-11,"[5, 15)",0.0 +2018-05-11,"[15, 65)",0.0 +2018-05-11,"[65, 120)",0.0 diff --git a/tutorials/influenza_1718/data/interim/ILI_weekly_ABS.csv b/tutorials/influenza_1718/data/interim/ILI_weekly_ABS.csv new file mode 100644 index 0000000..8ba3d40 --- /dev/null +++ b/tutorials/influenza_1718/data/interim/ILI_weekly_ABS.csv @@ -0,0 +1,129 @@ +DATE,AGE,CASES_100K +2017-10-06,"[0, 5)",21.922819257061885 +2017-10-06,"[5, 15)",82.46699323673484 +2017-10-06,"[15, 65)",1067.5870969823109 +2017-10-06,"[65, 120)",253.98380929881492 +2017-10-13,"[0, 5)",105.39251054298393 +2017-10-13,"[5, 15)",113.63230697197076 +2017-10-13,"[15, 65)",2054.6527565970073 +2017-10-13,"[65, 120)",246.99342922636765 +2017-10-20,"[0, 5)",86.52930347271797 +2017-10-20,"[5, 15)",73.73276518354852 +2017-10-20,"[15, 65)",1220.9824359764805 +2017-10-20,"[65, 120)",70.83585140076887 +2017-10-27,"[0, 5)",221.7784981665195 +2017-10-27,"[5, 15)",33.23770784603955 +2017-10-27,"[15, 65)",1603.3592230343904 +2017-10-27,"[65, 120)",402.64589217280843 +2017-11-03,"[0, 5)",28.328967294431106 +2017-11-03,"[5, 15)",251.27102410315675 +2017-11-03,"[15, 65)",1016.7908225854999 +2017-11-03,"[65, 120)",68.80140016875146 +2017-11-10,"[0, 5)",111.1087612006901 +2017-11-10,"[5, 15)",216.42600951133247 +2017-11-10,"[15, 65)",1114.3508427449024 +2017-11-10,"[65, 120)",154.0146438316763 +2017-11-17,"[0, 5)",218.72249327882938 +2017-11-17,"[5, 15)",233.84851680724393 +2017-11-17,"[15, 65)",1602.1509435419518 +2017-11-17,"[65, 120)",352.8455457118441 +2017-11-24,"[0, 5)",135.9426993725698 +2017-11-24,"[5, 15)",7.355921960379805 +2017-11-24,"[15, 65)",2187.511064498382 +2017-11-24,"[65, 120)",636.8896912549383 +2017-12-01,"[0, 5)",110.0911347517761 +2017-12-01,"[5, 15)",92.68269503546708 +2017-12-01,"[15, 65)",2854.4508510638325 +2017-12-01,"[65, 120)",226.65489361703138 +2017-12-08,"[0, 5)",572.4739007092211 +2017-12-08,"[5, 15)",834.1442553191508 +2017-12-08,"[15, 65)",3113.946382978731 +2017-12-08,"[65, 120)",226.65489361703138 +2017-12-15,"[0, 5)",110.0911347517761 +2017-12-15,"[5, 15)",1065.8509929078032 +2017-12-15,"[15, 65)",3373.4419148936213 +2017-12-15,"[65, 120)",226.65489361703138 +2017-12-22,"[0, 5)",0.0 +2017-12-22,"[5, 15)",278.04808510638884 +2017-12-22,"[15, 65)",4930.415106382987 +2017-12-22,"[65, 120)",1284.3777304964558 +2017-12-29,"[0, 5)",858.7108510638328 +2017-12-29,"[5, 15)",46.34134751773648 +2017-12-29,"[15, 65)",5708.901702127674 +2017-12-29,"[65, 120)",453.30978723405207 +2018-01-05,"[0, 5)",1849.5310638297876 +2018-01-05,"[5, 15)",1061.0314927659583 +2018-01-05,"[15, 65)",15310.23638297877 +2018-01-05,"[65, 120)",1813.2391489361742 +2018-01-12,"[0, 5)",2047.6951063829774 +2018-01-12,"[5, 15)",1760.971205673767 +2018-01-12,"[15, 65)",12974.776595744674 +2018-01-12,"[65, 120)",1359.9293617021308 +2018-01-19,"[0, 5)",2510.077872340424 +2018-01-19,"[5, 15)",4402.428014184404 +2018-01-19,"[15, 65)",19462.16489361705 +2018-01-19,"[65, 120)",1888.790780141847 +2018-01-26,"[0, 5)",2950.442411347518 +2018-01-26,"[5, 15)",9360.952198581566 +2018-01-26,"[15, 65)",23873.58893617021 +2018-01-26,"[65, 120)",1435.4809929078037 +2018-02-02,"[0, 5)",3721.080354609929 +2018-02-02,"[5, 15)",8573.149290780151 +2018-02-02,"[15, 65)",33734.41914893621 +2018-02-02,"[65, 120)",2870.9619858156157 +2018-02-09,"[0, 5)",4954.101063829788 +2018-02-09,"[5, 15)",13438.99078014183 +2018-02-09,"[15, 65)",42557.26723404253 +2018-02-09,"[65, 120)",4457.546241134744 +2018-02-16,"[0, 5)",7640.324751773054 +2018-02-16,"[5, 15)",6997.5434751773 +2018-02-16,"[15, 65)",45411.718085106404 +2018-02-16,"[65, 120)",4759.752765957449 +2018-02-23,"[0, 5)",3456.861631205672 +2018-02-23,"[5, 15)",7460.956950354612 +2018-02-23,"[15, 65)",54753.55723404256 +2018-02-23,"[65, 120)",6346.337021276598 +2018-03-02,"[0, 5)",5856.848368794328 +2018-03-02,"[5, 15)",10704.851276595748 +2018-03-02,"[15, 65)",50082.63765957452 +2018-03-02,"[65, 120)",6044.130496453893 +2018-03-09,"[0, 5)",6099.048865248232 +2018-03-09,"[5, 15)",16265.812978723407 +2018-03-09,"[15, 65)",56310.53042553193 +2018-03-09,"[65, 120)",6497.44028368794 +2018-03-16,"[0, 5)",5724.739007092199 +2018-03-16,"[5, 15)",4309.745319148934 +2018-03-16,"[15, 65)",27247.03085106384 +2018-03-16,"[65, 120)",5061.959290780132 +2018-03-23,"[0, 5)",3809.15326241135 +2018-03-23,"[5, 15)",11260.947446808517 +2018-03-23,"[15, 65)",16348.218510638304 +2018-03-23,"[65, 120)",4155.339716312062 +2018-03-30,"[0, 5)",3148.6064539007134 +2018-03-30,"[5, 15)",2224.3846808510666 +2018-03-30,"[15, 65)",5708.901702127674 +2018-03-30,"[65, 120)",2266.548936170207 +2018-04-06,"[0, 5)",660.5468085106367 +2018-04-06,"[5, 15)",926.826950354612 +2018-04-06,"[15, 65)",1816.4687234042924 +2018-04-06,"[65, 120)",604.4130496453914 +2018-04-13,"[0, 5)",616.5103546099305 +2018-04-13,"[5, 15)",463.413475177306 +2018-04-13,"[15, 65)",1037.9821276596062 +2018-04-13,"[65, 120)",226.65489361703138 +2018-04-20,"[0, 5)",0.0 +2018-04-20,"[5, 15)",0.0 +2018-04-20,"[15, 65)",778.4865957447155 +2018-04-20,"[65, 120)",0.0 +2018-04-27,"[0, 5)",0.0 +2018-04-27,"[5, 15)",0.0 +2018-04-27,"[15, 65)",259.49553191492714 +2018-04-27,"[65, 120)",0.0 +2018-05-04,"[0, 5)",0.0 +2018-05-04,"[5, 15)",0.0 +2018-05-04,"[15, 65)",259.49553191492714 +2018-05-04,"[65, 120)",75.55163120568352 +2018-05-11,"[0, 5)",0.0 +2018-05-11,"[5, 15)",0.0 +2018-05-11,"[15, 65)",0.0 +2018-05-11,"[65, 120)",0.0 diff --git a/tutorials/influenza_1718/data/interim/data_influenza_1718_format.csv b/tutorials/influenza_1718/data/interim/data_influenza_1718_format.csv deleted file mode 100644 index 2191801..0000000 --- a/tutorials/influenza_1718/data/interim/data_influenza_1718_format.csv +++ /dev/null @@ -1,129 +0,0 @@ -DATE,AGE,CASES -2017-10-07,"[0, 5)",3.0613377040580794 -2017-10-07,"[5, 15)",11.97849001259829 -2017-10-07,"[15, 65)",153.23601314251468 -2017-10-07,"[65, 120)",37.5422941211746 -2017-10-14,"[0, 5)",14.717179504485687 -2017-10-14,"[5, 15)",16.5053120133151 -2017-10-14,"[15, 65)",294.914389376907 -2017-10-14,"[65, 120)",36.509019971050115 -2017-10-21,"[0, 5)",12.083090962016017 -2017-10-21,"[5, 15)",10.709826521951657 -2017-10-21,"[15, 65)",175.25359863839907 -2017-10-21,"[65, 120)",10.4705113879237 -2017-10-28,"[0, 5)",30.96950581152279 -2017-10-28,"[5, 15)",4.827841247134187 -2017-10-28,"[15, 65)",230.13801465712254 -2017-10-28,"[65, 120)",59.51659104714774 -2017-11-04,"[0, 5)",3.955902508639896 -2017-11-04,"[5, 15)",36.49760145898314 -2017-11-04,"[15, 65)",145.94497469416456 -2017-11-04,"[65, 120)",10.169791563544168 -2017-11-11,"[0, 5)",15.515405930525565 -2017-11-11,"[5, 15)",31.43629580329104 -2017-11-11,"[15, 65)",159.94824297418378 -2017-11-11,"[65, 120)",22.765478924265118 -2017-11-18,"[0, 5)",30.54276037897729 -2017-11-18,"[5, 15)",33.966948631136994 -2017-11-18,"[15, 65)",229.96458437428518 -2017-11-18,"[65, 120)",52.15541609928212 -2017-11-25,"[0, 5)",18.98325695709154 -2017-11-25,"[5, 15)",1.0684618691373864 -2017-11-25,"[15, 65)",313.9841940544027 -2017-11-25,"[65, 120)",94.14104063502026 -2017-12-02,"[0, 5)",15.37330293819698 -2017-12-02,"[5, 15)",13.462340425532798 -2017-12-02,"[15, 65)",409.7133333333337 -2017-12-02,"[65, 120)",33.502705167174746 -2017-12-09,"[0, 5)",79.94117527862227 -2017-12-09,"[5, 15)",121.16106382978751 -2017-12-09,"[15, 65)",446.96000000000106 -2017-12-09,"[65, 120)",33.502705167174746 -2017-12-16,"[0, 5)",15.37330293819698 -2017-12-16,"[5, 15)",154.81691489361728 -2017-12-16,"[15, 65)",484.2066666666672 -2017-12-16,"[65, 120)",33.502705167174746 -2017-12-23,"[0, 5)",0.0 -2017-12-23,"[5, 15)",40.38702127659659 -2017-12-23,"[15, 65)",707.6866666666677 -2017-12-23,"[65, 120)",189.84866261398204 -2017-12-30,"[0, 5)",119.91176291793354 -2017-12-30,"[5, 15)",6.731170212766827 -2017-12-30,"[15, 65)",819.4266666666686 -2017-12-30,"[65, 120)",67.0054103343479 -2018-01-06,"[0, 5)",258.2714893617022 -2018-01-06,"[5, 15)",154.1168731914895 -2018-01-06,"[15, 65)",2197.55333333334 -2018-01-06,"[65, 120)",268.0216413373866 -2018-01-13,"[0, 5)",285.94343465045574 -2018-01-13,"[5, 15)",255.78446808510753 -2018-01-13,"[15, 65)",1862.3333333333323 -2018-01-13,"[65, 120)",201.01623100303996 -2018-01-20,"[0, 5)",350.51130699088134 -2018-01-20,"[5, 15)",639.4611702127669 -2018-01-20,"[15, 65)",2793.500000000004 -2018-01-20,"[65, 120)",279.18920972644423 -2018-01-27,"[0, 5)",412.0045187436677 -2018-01-27,"[5, 15)",1359.6963829787242 -2018-01-27,"[15, 65)",3426.6933333333327 -2018-01-27,"[65, 120)",212.18379939209763 -2018-02-03,"[0, 5)",519.6176393110436 -2018-02-03,"[5, 15)",1245.2664893617034 -2018-02-03,"[15, 65)",4842.066666666672 -2018-02-03,"[65, 120)",424.36759878419645 -2018-02-10,"[0, 5)",691.798632218845 -2018-02-10,"[5, 15)",1952.0393617021257 -2018-02-10,"[15, 65)",6108.45333333333 -2018-02-10,"[65, 120)",658.8865349544062 -2018-02-17,"[0, 5)",1066.9072239108416 -2018-02-17,"[5, 15)",1016.4067021276587 -2018-02-17,"[15, 65)",6518.166666666669 -2018-02-17,"[65, 120)",703.5568085106385 -2018-02-24,"[0, 5)",482.7217122593716 -2018-02-24,"[5, 15)",1083.7184042553195 -2018-02-24,"[15, 65)",7859.046666666667 -2018-02-24,"[65, 120)",938.0757446808514 -2018-03-03,"[0, 5)",817.8597163120571 -2018-03-03,"[5, 15)",1554.9003191489367 -2018-03-03,"[15, 65)",7188.606666666674 -2018-03-03,"[65, 120)",893.4054711246191 -2018-03-10,"[0, 5)",851.6809827760899 -2018-03-10,"[5, 15)",2362.6407446808516 -2018-03-10,"[15, 65)",8082.52666666667 -2018-03-10,"[65, 120)",960.410881458966 -2018-03-17,"[0, 5)",799.4117527862209 -2018-03-17,"[5, 15)",625.9988297872337 -2018-03-17,"[15, 65)",3910.900000000001 -2018-03-17,"[65, 120)",748.2270820668676 -2018-03-24,"[0, 5)",531.9162816616011 -2018-03-24,"[5, 15)",1635.6743617021286 -2018-03-24,"[15, 65)",2346.5400000000004 -2018-03-24,"[65, 120)",614.2162613981769 -2018-03-31,"[0, 5)",439.6764640324221 -2018-03-31,"[5, 15)",323.0961702127663 -2018-03-31,"[15, 65)",819.4266666666686 -2018-03-31,"[65, 120)",335.02705167173167 -2018-04-07,"[0, 5)",92.2398176291791 -2018-04-07,"[5, 15)",134.62340425531943 -2018-04-07,"[15, 65)",260.72666666667203 -2018-04-07,"[65, 120)",89.3405471124622 -2018-04-14,"[0, 5)",86.0904964539009 -2018-04-14,"[5, 15)",67.31170212765971 -2018-04-14,"[15, 65)",148.98666666667123 -2018-04-14,"[65, 120)",33.502705167174746 -2018-04-21,"[0, 5)",0.0 -2018-04-21,"[5, 15)",0.0 -2018-04-21,"[15, 65)",111.740000000005 -2018-04-21,"[65, 120)",0.0 -2018-04-28,"[0, 5)",0.0 -2018-04-28,"[5, 15)",0.0 -2018-04-28,"[15, 65)",37.24666666667147 -2018-04-28,"[65, 120)",0.0 -2018-05-05,"[0, 5)",0.0 -2018-05-05,"[5, 15)",0.0 -2018-05-05,"[15, 65)",37.24666666667147 -2018-05-05,"[65, 120)",11.167568389059193 -2018-05-12,"[0, 5)",0.0 -2018-05-12,"[5, 15)",0.0 -2018-05-12,"[15, 65)",0.0 -2018-05-12,"[65, 120)",0.0 diff --git a/tutorials/influenza_1718/data/interim/data_influenza_1718_format_100K.csv b/tutorials/influenza_1718/data/interim/data_influenza_1718_format_100K.csv deleted file mode 100644 index 5589514..0000000 --- a/tutorials/influenza_1718/data/interim/data_influenza_1718_format_100K.csv +++ /dev/null @@ -1,129 +0,0 @@ -DATE,AGE,CASES -2017-10-07,"[0, 5)",0.5043905150209872 -2017-10-07,"[5, 15)",0.9014971414571843 -2017-10-07,"[15, 65)",2.084137094505029 -2017-10-07,"[65, 120)",1.7030015323888286 -2017-10-14,"[0, 5)",2.424824200245443 -2017-10-14,"[5, 15)",1.2421842471975257 -2017-10-14,"[15, 65)",4.011080724425229 -2017-10-14,"[65, 120)",1.656129930579943 -2017-10-21,"[0, 5)",1.9908278872003429 -2017-10-21,"[5, 15)",0.8060179525872886 -2017-10-21,"[15, 65)",2.3835945505061287 -2017-10-21,"[65, 120)",0.4749655649965071 -2017-10-28,"[0, 5)",5.1025814517335855 -2017-10-28,"[5, 15)",0.3633417132813129 -2017-10-28,"[15, 65)",3.1300682089436145 -2017-10-28,"[65, 120)",2.6998042641907856 -2017-11-04,"[0, 5)",0.6517803315396129 -2017-11-04,"[5, 15)",2.7467972466239 -2017-11-04,"[15, 65)",1.9849729138660002 -2017-11-04,"[65, 120)",0.4613242483501386 -2017-11-11,"[0, 5)",2.556341163434414 -2017-11-11,"[5, 15)",2.365885080244943 -2017-11-11,"[15, 65)",2.1754289970554717 -2017-11-11,"[65, 120)",1.03269249791856 -2017-11-18,"[0, 5)",5.032270244897715 -2017-11-18,"[5, 15)",2.556341163434414 -2017-11-18,"[15, 65)",3.1277094130029 -2017-11-18,"[65, 120)",2.365885080244943 -2017-11-25,"[0, 5)",3.1277094130029 -2017-11-25,"[5, 15)",0.08041208197112486 -2017-11-25,"[15, 65)",4.270445912139757 -2017-11-25,"[65, 120)",4.270445912139757 -2017-12-02,"[0, 5)",2.5329280648430283 -2017-12-02,"[5, 15)",1.0131712259372498 -2017-12-02,"[15, 65)",5.572441742654514 -2017-12-02,"[65, 120)",1.519756838905843 -2017-12-09,"[0, 5)",13.171225937183413 -2017-12-09,"[5, 15)",9.11854103343467 -2017-12-09,"[15, 65)",6.079027355623114 -2017-12-09,"[65, 120)",1.519756838905843 -2017-12-16,"[0, 5)",2.5329280648430283 -2017-12-16,"[5, 15)",11.651469098277628 -2017-12-16,"[15, 65)",6.5856129685916995 -2017-12-16,"[65, 120)",1.519756838905843 -2017-12-23,"[0, 5)",0.0 -2017-12-23,"[5, 15)",3.039513677811614 -2017-12-23,"[15, 65)",9.625126646403258 -2017-12-23,"[65, 120)",8.61195542046607 -2017-12-30,"[0, 5)",19.756838905775144 -2017-12-30,"[5, 15)",0.5065856129686571 -2017-12-30,"[15, 65)",11.144883485309043 -2017-12-30,"[65, 120)",3.039513677811614 -2018-01-06,"[0, 5)",42.553191489361716 -2018-01-06,"[5, 15)",11.598784194528886 -2018-01-06,"[15, 65)",29.888551165147 -2018-01-06,"[65, 120)",12.158054711246226 -2018-01-13,"[0, 5)",47.112462006079 -2018-01-13,"[5, 15)",19.250253292806573 -2018-01-13,"[15, 65)",25.32928064842957 -2018-01-13,"[65, 120)",9.11854103343467 -2018-01-20,"[0, 5)",57.75075987841943 -2018-01-20,"[5, 15)",48.125633232016284 -2018-01-20,"[15, 65)",37.99392097264443 -2018-01-20,"[65, 120)",12.664640324214814 -2018-01-27,"[0, 5)",67.88247213779128 -2018-01-27,"[5, 15)",102.33029381965558 -2018-01-27,"[15, 65)",46.60587639311043 -2018-01-27,"[65, 120)",9.625126646403258 -2018-02-03,"[0, 5)",85.61296859169201 -2018-02-03,"[5, 15)",93.71833839918956 -2018-02-03,"[15, 65)",65.856129685917 -2018-02-03,"[65, 120)",19.250253292806573 -2018-02-10,"[0, 5)",113.98176291793314 -2018-02-10,"[5, 15)",146.90982776089143 -2018-02-10,"[15, 65)",83.08004052684899 -2018-02-10,"[65, 120)",29.888551165146858 -2018-02-17,"[0, 5)",175.78520770010144 -2018-02-17,"[5, 15)",76.49442755825729 -2018-02-17,"[15, 65)",88.65248226950358 -2018-02-17,"[65, 120)",31.914893617021285 -2018-02-24,"[0, 5)",79.53394123606886 -2018-02-24,"[5, 15)",81.56028368794328 -2018-02-24,"[15, 65)",106.88956433637286 -2018-02-24,"[65, 120)",42.553191489361716 -2018-03-03,"[0, 5)",134.75177304964544 -2018-03-03,"[5, 15)",117.02127659574471 -2018-03-03,"[15, 65)",97.77102330293829 -2018-03-03,"[65, 120)",40.52684903748729 -2018-03-10,"[0, 5)",140.3242147923 -2018-03-10,"[5, 15)",177.81155015197572 -2018-03-10,"[15, 65)",109.92907801418444 -2018-03-10,"[65, 120)",43.56636271529886 -2018-03-17,"[0, 5)",131.71225937183385 -2018-03-17,"[5, 15)",47.112462006079 -2018-03-17,"[15, 65)",53.191489361702146 -2018-03-17,"[65, 120)",33.94123606889557 -2018-03-24,"[0, 5)",87.63931104356642 -2018-03-24,"[5, 15)",123.10030395136786 -2018-03-24,"[15, 65)",31.914893617021285 -2018-03-24,"[65, 120)",27.862208713272572 -2018-03-31,"[0, 5)",72.44174265450872 -2018-03-31,"[5, 15)",24.316109422492428 -2018-03-31,"[15, 65)",11.144883485309043 -2018-03-31,"[65, 120)",15.197568389057713 -2018-04-07,"[0, 5)",15.197568389057713 -2018-04-07,"[5, 15)",10.131712259371856 -2018-04-07,"[15, 65)",3.5460992907802145 -2018-04-07,"[65, 120)",4.052684903748743 -2018-04-14,"[0, 5)",14.184397163120599 -2018-04-14,"[5, 15)",5.065856129685928 -2018-04-14,"[15, 65)",2.0263424518744286 -2018-04-14,"[65, 120)",1.519756838905843 -2018-04-21,"[0, 5)",0.0 -2018-04-21,"[5, 15)",0.0 -2018-04-21,"[15, 65)",1.519756838905843 -2018-04-21,"[65, 120)",0.0 -2018-04-28,"[0, 5)",0.0 -2018-04-28,"[5, 15)",0.0 -2018-04-28,"[15, 65)",0.5065856129686571 -2018-04-28,"[65, 120)",0.0 -2018-05-05,"[0, 5)",0.0 -2018-05-05,"[5, 15)",0.0 -2018-05-05,"[15, 65)",0.5065856129686571 -2018-05-05,"[65, 120)",0.5065856129686571 -2018-05-12,"[0, 5)",0.0 -2018-05-12,"[5, 15)",0.0 -2018-05-12,"[15, 65)",0.0 -2018-05-12,"[65, 120)",0.0 diff --git a/tutorials/influenza_1718/data/raw/dataset_influenza_1718.csv b/tutorials/influenza_1718/data/raw/ILI_weekly_1718_raw.csv similarity index 100% rename from tutorials/influenza_1718/data/raw/dataset_influenza_1718.csv rename to tutorials/influenza_1718/data/raw/ILI_weekly_1718_raw.csv diff --git a/tutorials/influenza_1718/data_conversion.py b/tutorials/influenza_1718/data_conversion.py index f8b6491..aa95ae8 100644 --- a/tutorials/influenza_1718/data_conversion.py +++ b/tutorials/influenza_1718/data_conversion.py @@ -3,16 +3,15 @@ ############################ import os -import datetime import pandas as pd +from datetime import datetime, timedelta ############### ## Load data ## ############### -abs_dir = os.path.dirname('') -rel_dir = 'data/raw/dataset_influenza_1718.csv' -data_dir = os.path.join(abs_dir,rel_dir) +rel_dir = 'data/raw/ILI_weekly_1718_raw.csv' +data_dir = os.path.join(os.getcwd(),rel_dir) data = pd.read_csv(data_dir) # Age groups in dataset desired_age_groups = pd.IntervalIndex.from_tuples([(0,5),(5,15),(15,65),(65,120)],closed='left') @@ -23,41 +22,21 @@ ## Perform conversion ## ######################## -# Convert YEAR-WEEK to a date -data = pd.read_csv(data_dir) -data['DATE']=0 +# convert YEAR-WEEK to the week's midpoint for i in range(len(data)): y=data['YEAR'][i] w=data['WEEK'][i] d = str(y)+'-W'+str(w) - r = datetime.datetime.strptime(d + '-6', "%Y-W%W-%w") + r = datetime.strptime(d + '-1', "%Y-W%W-%w") + timedelta(days=4) # Thursday taken as midpoint data.loc[i,'DATE']=r - +# convert age groups to pd.IntervalIndex age_groups = data['AGE'].unique() for id,age_group in enumerate(age_groups): data.loc[data['AGE']==age_group, 'AGE'] = desired_age_groups[id] data = data.drop(columns=['YEAR','WEEK']) data = data.groupby(by=['DATE','AGE']).sum().squeeze() - -# Define a dataframe with the desired format -new_DATE = data.index.get_level_values('DATE').unique() #pd.date_range(start=data.index.get_level_values('DATE').unique()[0],end=data.index.get_level_values('DATE').unique()[-1]) -iterables=[new_DATE, data.index.get_level_values('AGE').unique()] -names=['DATE', 'AGE'] -index = pd.MultiIndex.from_product(iterables, names=names) -data_new = pd.Series(index=index, name='CASES_100K', dtype=float) - -# Merge series with daily date and weekly date -data = data.to_frame()/7 -data_new = data_new.to_frame() -merge = data_new.merge(data, how='outer', on=['DATE', 'AGE'])['CASES_100K_y'] -merge.name = 'CASES' -# Loop down to series level to perform interpolation of intermittant dates -for age_group in merge.index.get_level_values('AGE').unique(): - interpol = merge.loc[slice(None),age_group].interpolate(method='linear').values - merge.loc[slice(None),age_group] = interpol - -# Per 100K inhabitants -absolute = merge.copy() +# compute weekly total per 100K +absolute = data.copy() for age_group in absolute.index.get_level_values('AGE').unique(): absolute.loc[slice(None),age_group] = absolute.loc[slice(None),age_group].values*N.loc[age_group]/100000 @@ -66,5 +45,5 @@ ################# # Write to a new .csv file -merge.to_csv(os.path.join(abs_dir,'data/interim/data_influenza_1718_format_100K.csv')) -absolute.to_csv(os.path.join(abs_dir,'data/interim/data_influenza_1718_format.csv')) +data.to_csv(os.path.join(os.getcwd(),'data/interim/ILI_weekly_100K.csv')) +absolute.to_csv(os.path.join(os.getcwd(),'data/interim/ILI_weekly_ABS.csv'))