-
Notifications
You must be signed in to change notification settings - Fork 44
/
Copy pathexample-nonlinear-timeseries.Rmd
514 lines (422 loc) · 24.6 KB
/
example-nonlinear-timeseries.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
---
title: "Advanced use demo: non-linear time series analysis"
output:
html_document:
toc: true
toc_float: true
theme: readable
highlight: haddock
fig_width: 10
fig_height: 4
---
##1. Background
This tutorial is to show you how to use `mgcv` for non-linear time series data.
With time series, the values you observe at a current time, $y(t)$, are assumed
to depend on the previous values of the time series: **_y(t) = f(y(t-1),
y(t-2),...)_**. Compare this to standard statistical methods, which assume each
data point is independent of the others once you account for any covariates, or
spatial statistics, which assume that the value at a given point depends on all
its neighbours, not just the ones in one direction from it.
A good example of time series data would be a population of animals fluctuating
over time. If the population is well-above carrying capacity currently, it's
likely going to be above its carrying capacity the next time we survey it as
well. To be able to effectively model this population, we need to understand how
it changes over time in response to its own density. Generalized additive
models, using `mgcv`, are very useful for this as they allow you to model
current values as a non-linear function of past values.
This tutorial has two major sections. In the first one (part 3), we talk about
how to use GAMs to model trends, with autocorrelated background noise. The
second section (part 4) focuses on using GAMs to model complex nonlinear
dependencies between time points, for cases where you really care about how
current observations depend on previous observations (for instance, when trying
to estimate population dynamics). You don't have to do both sections; each
stands alone, so feel free to work through whichever of the two is more useful
for you.
## 2. Key concepts and functions:
Here's a few key ideas and R functions you should familiarize yourself with if
you haven't already encountered them before. For the R functions (which will be)
highlighted `like this`, use `?function_name` to look them up. If you're
familiar with time series stats, skip to the next part.
### Cyclic smooths {#smoother}
This exercise will use a type of smoother we didn't cover in the intro: the
cyclical smooth. Standard smooth terms assume that the left and right end of the
range of your x variable have nothing to do with each other; one can be much
higher or lower than the other. However, there's several types of data where the
predictor starts and ends at the same point. Think of time of day: 11:59 and
00:00 are almost the same time, even though they occur on different days. In
this tutorial, we'll work with seasonal (monthly) data, and similarly, month 1
and month 12 are very close to each other. As such, the start and end of such a
smooth should also be very close; if our model predicts that it should be 10
degrees C at midnight, it should also predict the same temperature at 11:59:59.
`mgcv` allows us to easily model this type of data, using a cyclic smooth basis.
The standard smooth terms we've used so far are built so that their 2nd
derivates are zero at the ends of the data. The cyclic smooth instead assumes
that the value and 1st and 2nd derivatives are equal at the ends of the data.
To use a cyclic smooth, you need to code the smooth term like this: `s(x,
bs="cc")`. By default, `mgcv` assumes that the largest and smallest values of x
are the end points. If that isn't true, you can specify the end points of the
range as shown below and in the first example using the knots argument.
This code shows how the smooth basis functions vary between the default thin
plate ("tp"), cyclic ("cc") and cyclic smooth with end points outside the range
of the data. Note that no matter how you vary the cyclic smooths, the ends will
always be equal. For more info on cyclic smooths, refer to
`?smooth.construct.cc.smooth.spec`
```{r, echo=T,tidy=F,results="hide", include=T, message=FALSE,highlight=TRUE}
library(mgcv)
dat = data.frame(x = seq(0,1, length = 100))
tp_smooth = smooth.construct2(s(x,bs="tp", k=4),data = dat,knots = NULL)
cc_smooth = smooth.construct2(s(x,bs="cc", k=4),data = dat,knots = NULL)
cc_smooth2 = smooth.construct2(s(x,bs="cc", k=4),data = dat,
knots = list(x=c(-0.5,1.5)))
layout(matrix(1:3, nrow=1))
matplot(dat$x,tp_smooth$X, type="l",main="Thin plate spline" )
matplot(dat$x,cc_smooth$X, type="l",main="Cyclic spline" )
matplot(dat$x,cc_smooth2$X, type="l",main="Cyclic spline with wider knots")
layout(1)
```
### Glossary of terms
_Trend_: The long-term value around which a time series is fluctuating. For
instance, if the habitat for a population is slowly deteriorating, it will show
a decreasing trend.
_Seasonal effect_: Many time series show a strong seasonal pattern, so the
average value of the series depends on time of year. For instance, our
hypothetical population may reproduce in spring, so population numbers would be
higher then and decline throughout the year. We can use [cyclic
smoothers](#smoother) to model these effects.
_Stationary time series_: This is a time series without any trend or seasonal
components, so if you looked at multiple realizations of the time series, the
mean and variance (and other statistical properties) averaged across
realizations would not change with time.
_Autocorrelation function (`acf`)_: This is a function that describes how
correlated a time series is with itself at different time lags; that is, the acf
of a time series x is: acf(x,i) = cor(x(t),x(t-i)), where i is a given lag. In
`R`, use the function `acf(x)` to view the acf function for series `x`.
_`gamm`_: This function fits a generalized additive *mixed effects* model, using
the `nlme` package. We haven't discussed it too much in the rest of the course,
but it's a tool to include more complicated mixed effects models than is allowed
for by the bs="re" method we discussed earlier. More importantly for this
tutorial, it also allows you to add correlations between errors into your model.
This lets you specify how the errors are related to one another, using a range
of models. Look up `?corStruct` if you're interested in the range of possible
error correlations allowed. The `gamm` function creates a list object, with two
items in it: a gam model, containing the smooth terms, which functions like the
gam models you've been working with so far, and an lme object, that contains the
random effects, and in our case correlation functions.
## 3. Nonlinear decomposition of time series: trend, seasonal, and random values
In many cases, what we're interested in is estimating how things are changing
over time, and the fact that the time series is autocorrelated is a nuisance; it
makes it harder to get reliable estimates of the trends we're actually
interested in. In this section, we'll show you how to estimate non-linear
trends, while accounting for autocorrelation in residuals [^trendnote].
[^trendnote]: This is an expanded
example based on one on Gavin's blog, [available here](http://www.fromthebottomoftheheap.net/2016/03/25/additive-modeling-global-temperature-series-revisited/). His example also has some great code on how to model
the derivatives of the trend.
This example assumes that a time series can be broken down into two main
components: a mean effect, that changes over time, and an error term with
autocorrelated errors, so the model for the dependent variable looks like:
$$y(t) = f(t) + \epsilon(t) $$
$$\epsilon(t) = \sum_{i=1}^{n} \alpha_i \cdot \epsilon(t-i) + rnorm(0, \sigma)$$
Where $f(t)$ is a smooth function of time, $\alpha_i$ are the auto-regressive
coefficients, and $\sigma$ is the standard deviation of model errors.
###Loading and viewing the data
The first time series we'll look at is a long-term temperature and precipitation
data set from the convention center here at Fort Lauderdale, consisting of
monthly mean temperature (in degrees c), the number of days it rained that
month, and the amount of precipitation that month (in mm). The data range from
1950 to 2015, with a column for year and month (with Jan. = 1). The data is from
a great web-app called FetchClim2, and the link for the data [I used is
here](http://fetchclimate2.cloudapp.net/#page=geography&dm=values&t=years&v=airt(13,6,2,1),prate(12,7,2,1),wet(1)&y=1950:1:2015&dc=1,32,60,91,121,152,182,213,244,274,305,335,366&hc=0,24&p=26.099,-80.123,Point%201&ts=2016-07-17T21:57:11.213Z).
Here we'll load it and plot the air temperature data. Note that air temperature
shows a strong seasonal trend, and that air-temperature is strongly
auto-correlated.
```{r, echo=T,tidy=F,include=T, message=FALSE,highlight=TRUE}
library(dplyr)
library(mgcv)
library(ggplot2)
library(tidyr)
florida = read.csv("data/time_series/Florida_climate.csv",
stringsAsFactors = F)
head(florida)
ggplot(aes(x=month, y=air_temp,color=year,group=year),
data= florida)+
geom_point()+
geom_line()
acf(florida$air_temp,lag.max = 36)
```
Looking at the time series, it does look like later years are
warmer, but we should model it to see how much warmer.
###Modelling yearly and seasonal trends
The first model we'll fit will include both a yearly and seasonal trend. Note
that the seasonal trend uses a cyclic smoother, as January and December values
should be close to one another. Also note that we've specified a `knots` command
for the month term, with knots at 0.5 and 12.5 months. This is because if we
just left it as is, `mgcv` would assume that month 1 and month 12 were the end
points, and should therefore have the same value. Since January and December
will have different mean temperatures, this creates a bias. By specifying knots
like this, we are stating that the point of equality should occur equidistant
between the middle of January and the middle of December.
```{r, echo=T,tidy=F,include=T, message=FALSE}
model_temp_basic = gamm(air_temp~s(year)+s(month,bs="cc",k=12),data=florida,
knots = list(month = c(0.5, 12.5)))
plot(model_temp_basic$gam,page=1,scale=0)
summary(model_temp_basic$gam)
```
It appears that there's about a 8 degree C difference between the warmest and coolest
months, and that temperatures haven been rising non-linearly, with a much faster rate
of increase since 2010. The model seems to fit well; however, if we look at the acf,
there's still substantial unexplained autocorrelation:
```{r, echo=T,tidy=F,include=T, message=FALSE}
acf(residuals(model_temp_basic$gam),lag.max = 36)
```
To incorporate the auto-correlated error structure, we need to use the slightly
more complex `gamm` function. This uses the `nlme` package to fit the model, and
it allows us to add correlation functions (in this case, an auto-regressive
model) to our GAM model. The model we'll add (an auto-regressive order-p model)
assumes that the residuals at time t $\epsilon_t = \sum_{i=1}^{p}
\alpha_p\epsilon_{t-i} + rnorm(0,\sigma)$. The more values you add to the
auto-regressive model, the more complex time dependencies you can model. Also
note we specify the form of the correlation as `form= ~1|year`. This means that
temperatures will only be assumed to be correlated within a given year, so
there's no assumed correlation between the temperature in December of one year
and January of the next. This is just to speed up computations for the workshop;
in an actual analysis, we would not recommend ignoring these
correlations[^sequential_data].
[^sequential_data]:Also note that if your data is not organized so that data are
sequential (so that `gamm` can assume that the prior data point is the one that
occurs immediately before the current one), you'd have to give `corARMA` a
unique term specifying the time each observation occurs at. You could do that
for this data using:
`florida= mutate(florida, time = as.Date(paste(year, month,"15", sep = "-")))`
`corARMA(form = time|year, p=2)`
```{r, echo=T,tidy=F,include=T, message=FALSE}
model_temp_autoreg1 = gamm(air_temp~s(year)+s(month,bs="cc",k=12),
correlation = corARMA(form=~1|year,p = 1),
data=florida, knots = list(month = c(0.5, 12.5)))
model_temp_autoreg2 = gamm(air_temp~s(year)+s(month,bs="cc",k=12),
correlation = corARMA(form=~1|year,p = 2),
data=florida, knots = list(month = c(0.5, 12.5)))
layout(matrix(1:6, nrow = 2,byrow = F))
plot(model_temp_basic$gam,scale=0,main="basic model")
plot(model_temp_autoreg1$gam,scale=0,main="lag-1 model")
plot(model_temp_autoreg2$gam,scale=0,main="lag-2 model")
layout(1)
layout(matrix(1:3, nrow = 1))
acf(residuals(model_temp_basic$lme,type="normalized"),main="basic model",lag.max = 36)
acf(residuals(model_temp_autoreg1$lme,type="normalized"),main="lag-1 model",lag.max = 36)
acf(residuals(model_temp_autoreg2$lme,type="normalized"),main="lag-2 model",lag.max = 36)
layout(1)
```
Note that including the auto-regressive model resulted in a substantially
smoother estimate of the long-term trend. Warming appears to still be speeding
up, although not at the same rapid rate as in the naive model. Adding the first
lag also seems to remove most of the auto-correlation in the data, although
there appears to be at least some long-period auto-correlation at the 12-month
scale. We can use an ANOVA to test whether adding the first or second
autoregressive terms actually result in a better fitting model than the naive
one.
```{r, echo=T,tidy=F,include=T, message=FALSE}
anova(model_temp_basic$lme, model_temp_autoreg1$lme, model_temp_autoreg2$lme)
```
It appears that adding the first autoregressive term substantially improved the model,
but the second lag term may not be necessary.
###Exercises
1. Modify the model of temperature and season to allow the seasonal effect to
change over time. Is there evidence for changing seasonal patterns? How does
this change once temperature autocorrelation is accounted for? Hint: this will
require use tensor smooths.
2. Construct a model to estimate seasonal and long-term trends in precipitation
for the convention center. What lags are necessary to include in the model?
Remember that precipitation is a positive variable (i.e. cannot go below zero),
so you may want to transform the variable to better suit the model's
assumptions.
## 4. Stationary nonlinear autoregressive models and forecasting
So far we've focused on estimating trends where the presence of auto-correlation
is just a nuisance effect we need to account for to get accurate estimates.
However, in population and community ecology, we often find the time dependence
itself to be interesting, as it can give us information on how the population is
changing in response to its own density; that is, it helps us estimate what
factors are affecting population growth rates. We can use this information to
estimate what the equilibrium (carrying capacity) population density is, whether
this equilibrium is stable, and how quickly we expect the population to return to
equilibrium after perturbation. We can also use these methods to forecast the
population into the future, which is very useful for managers[^lynxnote].
[^lynxnote]: This example is
based off an example by the statistician Cosma Shalizi, at Carnegie Mellon University. The original example starts at page 517 of his book Advanced
Data Analysis from an Elementary Point of view (the example starts at page 517
of the pdf draft [available here](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf)). This book
is a great learning tool for stats in general, and for more on using GAMs.
We also know, though, that density dependent effects in ecology are often
strongly non-linear, so that observed densities at a given time may be a complex
function of lagged density values: $x(t) = f(x(t-1), x(t-2)...)$. Therefore, the
linear correlation equations we were just working with may not suffice to
estimate the real population dynamics. This is where `mgcv` comes in, letting us
analyze complex time-dependent relationships.
###Loading and viewing the data
The second data set we'll work on is the famous "lynx" data. The data set
consists of the annual numbers of lynx caught by Hudson's Bay trappers from 1821
to 1934. This is one of the most well-known data sets in time series analysis
(and in population ecology), as it shows clearly cyclic behaviour, with a period
of 9-10 years. It is also one of the most well-studied data sets in time series
statistics, as standard (linear) time series methods have a hard time capturing
the strong cyclic behaviours.
Here we'll load it, convert it into a data frame that `mgcv`
is able to handle, and plot the data. Note the strong cyclicity, obvious from the
acf function.
```{r, echo=T,tidy=F,include=T, message=FALSE}
library(dplyr)
library(mgcv)
library(ggplot2)
library(tidyr)
data(lynx)
lynx_full = data.frame(year=1821:1934, population = as.numeric(lynx))
head(lynx_full)
ggplot(aes(x=year, y=population), data= lynx_full)+
geom_point()+
geom_line()
acf(lynx_full$population)
```
Next we'll add some extra columns to the data, representing lagged population
values. Here we'll use the `lag` function from the `dplyr` package. Note that
we've log- transformed the population data for the lag values. This is because
our model will assume we're dealing with log-responses, so it will make
interpreting plots easier. Also note that we specified the `default` argument in
the `lag` function as the mean log population. This will replace all the `NA`s
generated at the start of the time series (where there's no lagged values
possible) with the mean value, so we don't end up throwing out data, and so we
can compare models with different lags (as they'll have the same number of data
points).
We'll also split the data set into 2: a training data set to fit models, and a
40 year testing data set to test how well our models fit out of sample. This is
often a crucial step in fitting this sort of data; it is very easy to overfit
time series data, and holding data in reserve gives us something to test for
overfitting issues.
```{r, echo=T,tidy=F,include=T, message=FALSE}
mean_pop_l = mean(log(lynx_full$population))
lynx_full = mutate(lynx_full,
popl = log(population),
lag1 = lag(popl,1, default = mean_pop_l),
lag2 = lag(popl,2, default = mean_pop_l),
lag3 = lag(popl,3, default = mean_pop_l),
lag4 = lag(popl,4, default = mean_pop_l),
lag5 = lag(popl,5, default = mean_pop_l),
lag6 = lag(popl,6, default = mean_pop_l))
lynx_train = filter(lynx_full, year<1895)
lynx_test = filter(lynx_full, year>=1895)
```
###Fitting and testing linear models
We'll start by fitting a series of linear auto-regressive models of increasing
order, and seeing how well each forecasts the test data, using one-step-ahead
forecasting, where we try to forecast the next step for each time point, given
the observed data series:
```{r, echo=T,tidy=F,include=T, message=FALSE,fig.width=12}
lynx_lm1 = gam(population~lag1,
data=lynx_train, family = "poisson", method="REML")
lynx_lm2 = update(lynx_lm1,formula = population~lag1+lag2)
lynx_lm3 = update(lynx_lm1,formula = population~lag1+lag2+lag3)
lynx_lm4 = update(lynx_lm1,formula = population~lag1+lag2+lag3+lag4)
round(AIC(lynx_lm1,lynx_lm2,
lynx_lm3,lynx_lm4))
lynx_predict_lm = mutate(
lynx_full,
lag1_model = as.vector(exp(predict(lynx_lm1,lynx_full))),
lag2_model = as.vector(exp(predict(lynx_lm2,lynx_full))),
lag3_model = as.vector(exp(predict(lynx_lm3,lynx_full))),
lag4_model = as.vector(exp(predict(lynx_lm4,lynx_full))),
data_type = factor(ifelse(year<1895, "training","testing"),
levels= c("training","testing"))
)
#Gathers all of the predictions into two columns: the model name and the predicted value
lynx_predict_lm = gather(lynx_predict_lm,key= model, value= pop_est,
lag1_model:lag4_model )%>%
mutate(pop_est = as.numeric(pop_est))
forecast_accuracy_lm = lynx_predict_lm %>%
group_by(model)%>%
filter(data_type=="testing")%>%
summarize(out_of_sample_r2 = round(cor(log(pop_est),log(population))^2,2))
print(forecast_accuracy_lm)
ggplot(aes(x=year, y=population), data= lynx_predict_lm)+
geom_point()+
geom_line()+
geom_line(aes(y=pop_est,color=model))+
scale_color_brewer(palette="Set1")+
facet_grid(.~data_type,scales = "free_x")+
theme_bw()
```
Note that the anova table shows that adding each lag substantially reduced model
AIC, up to lag-4. Some work on this time series shows that you need up to 11
(!!) linear lag terms to effectively fit the model. Even the long-lagged models
under-estimate population densities when the cycle is at a peak, and seem to
miss-estimate the period of the cycle; the predicted peaks in the testing data
occur 1-2 years after they do in the observed data. Further, looking at the
out-of-sample estimates of $R^2$, $R^2$ is highest with a two-lag model.
###Fitting and testing non-linear models
Now let's look at how adding non-linear lag terms improves the model. We've
constrained the degrees of freedom of each smooth term as there's complex
correlations between the lags, and allowing too much freedom for the smooths can
allow for complex and difficult to interpret smooths, as well as poor
out-of-sample performance.
```{r, echo=T,tidy=F,include=T, message=FALSE,fig.width=12}
lynx_gam1 = gam(population~s(lag1,k=5),
data=lynx_train, family = "poisson", method="REML")
lynx_gam2 = update(lynx_gam1,population~s(lag1,k=5)+s(lag2,k=5))
lynx_gam3 = update(lynx_gam1,population~s(lag1,k=5)+s(lag2,k=5)+s(lag3,k=5))
lynx_gam4 = update(lynx_gam1,population~s(lag1,k=5)+s(lag2,k=5)+s(lag3,k=5)+
s(lag4,k=5))
round(AIC(lynx_gam1,lynx_gam2,lynx_gam3,lynx_gam4))
lynx_predict_gam = mutate(
lynx_full,
lag1_model = as.vector(exp(predict(lynx_gam1,lynx_full))),
lag2_model = as.vector(exp(predict(lynx_gam2,lynx_full))),
lag3_model = as.vector(exp(predict(lynx_gam3,lynx_full))),
lag4_model = as.vector(exp(predict(lynx_gam4,lynx_full))),
data_type = factor(ifelse(year<1895, "training","testing"),
levels= c("training","testing"))
)
#Gathers all of the predictions into two columns: the model name and the predicted value
lynx_predict_gam = gather(lynx_predict_gam,key= model, value= pop_est,
lag1_model:lag4_model)%>%
mutate(pop_est = as.numeric(pop_est))
forecast_accuracy_gam = lynx_predict_gam %>%
group_by(model)%>%
filter(data_type=="testing")%>%
summarize(out_of_sample_r2 = round(cor(log(pop_est),log(population))^2,2))
print(forecast_accuracy_gam)
ggplot(aes(x=year, y=population), data= lynx_predict_gam)+
geom_point()+
geom_line()+
geom_line(aes(y=pop_est,color=model))+
scale_color_brewer(palette="Set1")+
facet_grid(.~data_type, scales="free_x")+
theme_bw()
```
The nonlinear models are more able to effectively capture the timing and
magnitude of peaks in the cycle than any of the linear models, and explain
more of the variance out-of-sample, accounting for 86% of OOS variance
for the 2-lag model.
```{r, echo=T,tidy=F,include=T, message=FALSE}
plot(lynx_gam2, page=1,scale=0)
```
Our best fitting model, the 2-lag nonlinear model, indicates that log-lynx
densities increase roughly linearly with the prior year's log-density, but
decrease nonlinearly when population densities two year's previous are high,
decreasing at a faster rate when (log) densities are very high two years
previously. You'll see this sort of lagged response in predator-prey cycles.
### Exercises:
1. Focusing on the two-lag model: is there any evidence of an interaction
between the population at lag 1 and the population at lag 2? Make sure to choose
appropriate interaction terms, keeping in mind what you know about the data.
2. We ignored confidence intervals on our forecasts. For the 2-lag linear and
nonlinear models, determine the appropriate confidence intervals and determine
if the observed out-of-sample population densities fall within them.
3. We assumed that errors were Poisson-distributed for this exercise. However,
we know many natural populations are over-dispersed, that is,they show more
variation than we'd expect from a Poisson distribution. Choose a distribution to
model this extra variation. How does it affect your model estimates, and the
estimated degree of non-linearity present?
4. CHALLENGING: We focused, for simplicity, on one-step-ahead forecasting.
However, we often want to forecast over longer time horizons. If you have the
time, try figuring out how you can use this model to forecast whole trajectories
of the out-of-sample data. Do the forecast trajectories for the linear or
non-linear models have the same dynamic properties as the observed data? Hint:
This will likely require a for-loop, and you'll have to calculate new lagged
terms for each step.