forked from avehtari/BDA_course_Aalto
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathBDA_notes_ch14-18.tex
393 lines (366 loc) · 14.8 KB
/
BDA_notes_ch14-18.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
\documentclass[a4paper,11pt]{article}
% \usepackage{babel}
\usepackage[utf8]{inputenc}
% \usepackage[T1]{fontenc}
\usepackage{times}
\usepackage{amsmath}
\usepackage{microtype}
\usepackage{url}
\urlstyle{same}
\usepackage{color}
\usepackage[bookmarks=false]{hyperref}
\hypersetup{%
bookmarksopen=true,
bookmarksnumbered=true,
pdftitle={Bayesian data analysis},
pdfsubject={Comments},
pdfauthor={Aki Vehtari},
pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
pdfstartview={FitH -32768},
colorlinks=true,
linkcolor=navyblue,
citecolor=black,
filecolor=black,
urlcolor=blue
}
% if not draft, smaller printable area makes the paper more readable
\topmargin -4mm
\oddsidemargin 0mm
\textheight 225mm
\textwidth 160mm
%\parskip=\baselineskip
\def\eff{\mathrm{rep}}
\DeclareMathOperator{\E}{E}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\Sd}{Sd}
\DeclareMathOperator{\sd}{sd}
\DeclareMathOperator{\Bin}{Bin}
\DeclareMathOperator{\Beta}{Beta}
\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
\DeclareMathOperator{\logit}{logit}
\DeclareMathOperator{\N}{N}
\DeclareMathOperator{\U}{U}
\DeclareMathOperator{\tr}{tr}
%\DeclareMathOperator{\Pr}{Pr}
\DeclareMathOperator{\trace}{trace}
\DeclareMathOperator{\rep}{\mathrm{rep}}
\pagestyle{empty}
\begin{document}
\thispagestyle{empty}
\section*{Bayesian data analysis -- reading instructions Part IV}
\smallskip
{\bf Aki Vehtari}
\smallskip
\bigskip
\noindent
Part IV, Chapters 14--18 discuss basics of linear and generalized
linear models with several examples. The parts discussing computation
can be useful to provide additional insight on these models or
sometimes for actual computation, it's likely that most of the readers
will use some probabilistic programming framework for
computation. Regression and other stories (ROS) by Gelman, Hill and
Vehtari discusses linear and generalized linear models from the
modeling perspective more thoroughly.
\subsection*{Chapter 14: Introduction to regression models}
Outline of the chapter 14:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[14.1] Conditional modeling
\begin{itemize}
\item formal justification of conditional modeling
\item if joint model factorizes $p(y,x|\theta,\phi)={\color{blue}p(y|x,\theta)}p(x|\phi)$\\
we can model just ${\color{blue}p(y|x,\theta)}$
\end{itemize}
\item[14.2] Bayesian analysis of classical regression
\begin{itemize}
\item uninformative prior on $\beta$ and $\sigma^2$
\item connection to multivariate normal (cf. Chapter 3) is useful to understand as it then reveals what would be the conjugate prior
\item closed form posterior and posterior predictive distribution
\item these properties are sometimes useful and thus good to know,
but with probabilistic programming less often needed
\end{itemize}
\item[14.3] Regression for causal inference: incumbency and voting
\begin{itemize}
\item Modelling example with bit of discussion on causal inference
(see more in ROS Chs. 18-21)
\end{itemize}
\item[14.4] Goals of regression analysis
\begin{itemize}
\item discussion of what we can do with regression analysis (see
more in ROS)
\end{itemize}
\item[14.5] Assembling the matrix of explanatory variables
\begin{itemize}
\item transformations, nonlinear relations, indicator variables,
interactions (see more in ROS)
\end{itemize}
\item[14.6] Regularization and dimension reduction
\begin{itemize}
\item a bit outdated and short (Bayesian Lasso is not a good idea),
see more in lecture 9.3,
\url{https://avehtari.github.io/modelselection/} and
\url{https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html})
\end{itemize}
\item[14.7] Unequal variances and correlations
\begin{itemize}
\item useful concept, but computation is easier with probabilistic
programming frameworks
\end{itemize}
\item[14.8] Including numerical prior information
\begin{itemize}
\item useful conceptually, but easy computation with probabilistic
programming frameworks makes it easier to define prior information
as the prior doesn't need to be conjugate
\item see more about priors in \url{https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations}
\end{itemize}
\end{list}
\subsection*{Chapter 15 Hierarchical linear models}
Chapter 15 combines hierarchical models from Chapter 5 and linear
models from Chapter 14. The chapter discusses some computational
issues, but probabilistic programming frameworks make computation for
hierarchical linear models easy.
\vspace{\baselineskip}
\noindent
Outline of the chapter 15:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[15.1] Regression coefficients exchangeable in batches
\begin{itemize}
\item exchangeability of parameters
\item the discussion of fixed-, random- and mixed-effects models
is incomplete
\begin{itemize}
\item we don't recommend using these terms, but they are so
popular that it's useful to know them
\item a relevant comment is \emph{The terms ‘fixed’ and ‘random’
come from the non-Bayesian statistical tradition and are
somewhat confusing in a Bayesian context where all unknown
parameters are treated as ‘random’ or, equivalently, as
having fixed but unknown values.}
\item often fixed effects correspond to population level
coefficients, random effects correspond to group or individual
level coefficients, and mixed model has both\\
\begin{tabular}[t]{ll}
{\tt y $\sim$ 1 + x} & fixed / population effect; pooled model\\
{\tt y $\sim$ 1 + (0 + x | g) } & random / group effects \\
{\tt y $\sim$ 1 + x + (1 + x | g) } & mixed effects; hierarchical model
\end{tabular}
\end{itemize}
\end{itemize}
\item[15.2] Example: forecasting U.S. presidential elections
\begin{itemize}
\item illustrative example
\end{itemize}
\item[15.3] Interpreting a normal prior distribution as extra data
\begin{itemize}
\item includes very useful interpretation of hierarchical linear
model as a single linear model with certain design matrix
\end{itemize}
\item[15.4] Varying intercepts and slopes
\begin{itemize}
\item extends from hierarchical model for scalar parameter to
joint hierarchical model for several parameters
\end{itemize}
\item[15.5] Computation: batching and transformation
\begin{itemize}
\item Gibbs sampling part is mostly outdated
\item transformations for HMC is useful if you write your own
models, but the section is quite short and you can get more
information from Stan user guide 21.7 Reparameterization and
\url{https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html}
\end{itemize}
\item[15.6] Analysis of variance and the batching of coefficients
\begin{itemize}
\item ANOVA as Bayesian hierarchical linear model
\item rstanarm and brms packages make it easy to make ANOVA
\end{itemize}
\item[15.7] Hierarchical models for batches of variance components
\begin{itemize}
\item more variance components
\end{itemize}
\end{list}
\subsection*{Chapter 16 Generalized linear models}
Chapter 16 extends linear models to have non-normal observation
models. Model in Bioassay example in Chapter 3 is also generalized
linear model. Chapter reviews the basics and discusses some
computational issues, but probabilistic programming frameworks make
computation for generalized linear models easy (especially with
rstanarm and brms). Regression and other stories (ROS) by Gelman, Hill
and Vehtari discusses generalized linear models from the modeling
perspective more thoroughly.
\vspace{\baselineskip}
\noindent
Outline of the chapter 16:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[16 Intro:]
Parts of generalized linear model (GLM):
\begin{itemize}
\item[1.] The linear predictor $\eta = X\beta$
\item[2.] The link function $g(\cdot)$ and $\mu = g^{-1}(\eta)$
\item[3.] Outcome distribution model with location parameter $\mu$
\begin{itemize}
\item the distribution can also depend on dispersion
parameter $\phi$
\item originally just exponential family distributions
(e.g. Poisson, binomial, negative-binomial), which all have
natural location-dispersion parameterization
\item after MCMC made computation easy, GLM can refer to
models where outcome distribution is not part of exponential
family and dispersion parameter may have its own latent linear
predictor
\end{itemize}
\end{itemize}
\item[16.1] Standard generalized linear model likelihoods
\begin{itemize}
\item section title says ``likelihoods'', but it would be better to say ``observation models''
\item continuous data: normal, gamma, Weibull mentioned, but common
are also Student's $t$, log-normal, log-logistic, and various
extreme value distributions like generalized Pareto distribution
\item binomial (Bernoulli as a special case) for binary and count
data with upper limit
\begin{itemize}
\item Bioassay model uses binomial observation model
\end{itemize}
\item Poisson for count data with no upper limit
\begin{itemize}
\item Poisson is useful approximation of Binomial when the observed
counts are much smaller than the upper limit
\end{itemize}
\end{itemize}
\item[16.2] Working with generalized linear models
\begin{itemize}
\item bit of this and that information on how think about GLMs (see
ROS for more)
\item normal approximation to the likelihood is good for thinking
how much information non-normal observations provide, can be
useful for someone thinking about computation, but easy
computation with probabilistic programming frameworks means not
everyone needs this
\end{itemize}
\item[16.3] Weakly informative priors for logistic regression
\begin{itemize}
\item an excellent section although the recommendation on using
Cauchy has changed (see
\url{https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations})
\item the problem of separation is useful to understand
\item computation part is outdated as probabilistic programming
frameworks make the computation easy
\end{itemize}
\item[16.4] Overdispersed Poisson regression for police stops
\begin{itemize}
\item an example
\end{itemize}
\item[16.5] State-level opinions from national polls
\begin{itemize}
\item another example
\end{itemize}
\item[16.6] Models for multivariate and multinomial responses
\begin{itemize}
\item extension to multivariate responses
\item polychotomous data with multivariate binomial or Poisson
\item models for ordered categories
\end{itemize}
\item[16.7] Loglinear models for multivariate discrete data
\begin{itemize}
\item multinomial or Poisson as loglinear models
\end{itemize}
\end{list}
\subsection*{Chapter 17 Models for robust inference}
Chapter 17 discusses over-dispersed observation models. The discussion
is useful beyond generalized linear models. The computation is
outdated. See Regression and other stories (ROS) by Gelman, Hill
and Vehtari for more examples.
\vspace{\baselineskip}
\noindent
Outline of the chapter 17:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[17.1] Aspects of robustness
\begin{itemize}
\item overdispersed models are often connected to robustness of
inferences to outliers, but the observed data can be overdispersed
without any observation being outlier
\item outlier is sensible only in the context of the model, being
something not well modelled or something requiring extra model
component
\item switching to generic overdispersed model can help to recognize
problem in the non-robust model (sensitivity analysis), but it
can also throw away useful information in the ``outliers'' and it
would be useful to think what is the generative mechanism for
observations which are not like others
\end{itemize}
\item[17.2] Overdispersed versions of standard models\\
\begin{tabular}[t]{lcl}\small
normal & $\rightarrow$ & $t$-distribution\\
Poisson & $\rightarrow$ & negative-binomial \\
binomial & $\rightarrow$ & beta-binomial \\
probit & $\rightarrow$ & logistic / robit
\end{tabular}
\item[17.3] Posterior inference and computation
\begin{itemize}
\item computation part is outdated as probabilistic programming
frameworks and MCMC make the computation easy
\item posterior is more likely to be multimodal
\end{itemize}
\item[17.4] Robust inference for the eight schools
\begin{itemize}
\item eight schools example is too small too see much difference
\end{itemize}
\item[17.5] Robust regression using t-distributed errors
\begin{itemize}
\item computation part is outdated as probabilistic programming
frameworks and MCMC make the computation easy
\item posterior is more likely to be multimodal
\end{itemize}
\end{list}
\subsection*{Chapter 18 Models for missing data}
Chapter 18 extends the data collection modelling from Chapter 8. See
Regression and other stories (ROS) by Gelman, Hill and Vehtari for
more examples.
\vspace{\baselineskip}
\noindent
Outline of the chapter 18:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[18.1] Notation
\begin{itemize}
\item Missing completely at random (MCAR)\\
missingness does not depend on missing values or other observed
values (including covariates)
\item Missing at random (MAR)\\
missingness does not depend on missing values but may depend on
other observed values (including covariates)
\item Missing not at random (MNAR)\\
missingness depends on missing values
\end{itemize}
\item[18.2] Multiple imputation
\begin{itemize}
\item[1.] make a model predicting missing data
\item[2.] sample repeatedly from the missing data model to generate
multiple imputed data sets
\item[3.] make usual inference for each imputed data set
\item[4.] combine results
\item discussion of computation is partially outdated
\end{itemize}
\item[18.3] Missing data in the multivariate normal and $t$ models
\begin{itemize}
\item a special continuous data case computation, which can still
be useful as fast starting point
\end{itemize}
\item[18.4] Example: multiple imputation for a series of polls
\begin{itemize}
\item an example
\end{itemize}
\item[18.5] Missing values with counted data
\begin{itemize}
\item discussion of computation for count data (ie computation in
18.3 is not applicable)
\end{itemize}
\item[18.6] Example: an opinion poll in Slovenia
\begin{itemize}
\item another example
\end{itemize}
\end{list}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: