-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathClusterRandomEGAP.Rmd
1499 lines (1205 loc) · 64.8 KB
/
ClusterRandomEGAP.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "6+ Things You Need to Know About Cluster Randomization"
author: "Jake Bowers and Ashlea Rundlett"
date: "November 5, 2014"
output:
html_document:
fig_caption: true
number_sections: true
toc: true
toc_depth: 2
citation-abbreviations: false
bibliography: cluster.bib
---
\newcommand{\pr}{\text{pr}}
\newcommand{\Dt}{\Delta t}
\newcommand{\bz}{\mathbf{z}}
\newcommand{\bm}{\mathbf{m}}
\newcommand{\by}{\mathbf{y}}
\newcommand{\bY}{\mathbf{Y}}
\newcommand{\bI}{\mathbf{I}}
\newcommand{\e}{\mathrm{e}}
\newcommand{\E}{\mathrm{E}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bX}{\mathbf{X}}
\newcommand{\bZ}{\mathbf{Z}}
\newcommand{\be}{\mathbf{e}}
\newcommand{\var}{\mathrm{Var}}
\newcommand{\cov}{\mathrm{Cov}}
\newcommand{\bpsi}{\boldsymbol{\psi}}
\newcommand{\bmu}{\boldsymbol{\mu}}%m
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\bdelta}{\boldsymbol{\delta}}
\newcommand{\bSigma}{\boldsymbol{\Sigma}}
\newcommand{\bOmega}{\boldsymbol{\Omega}}
\newcommand{\bgamma}{\boldsymbol{\gamma}}
```{r setup, include=FALSE,echo=FALSE,results='hide'}
require(knitr)
knitr::opts_chunk$set(message=FALSE,warning=FALSE,eval=TRUE,results='markup',cache=FALSE, comment=NA, tidy=FALSE)
options(digits=3,scipen=6)
```
**How to use and improve this guide.**
This guide and all of the code it contains is available for copying at <https://github.com/bowers-illinois-edu/EgapMethodsGuides>. We encourage you to copy and improve the guide. See <https://guides.github.com/activities/forking/> for one workflow in which you copy the guide, make your own changes, and then request that we include your changes in the main guide.
This guide involves a lot of R code @R-Core-Team:2014aa <http://r-project.org> that we use to show how things work and to enable you to experiment and also to adapt it for your own particular purposes.
# What it is, what it isn't, and why we use it.
Cluster randomized experiments allocate treatments across entire groups of individuals. Such studies typically measure outcomes at the level of the individual even though the intervention occurs at the level of the group. A study randomly assigning villages to receive different development programs but measuring individual level outcomes and a study randomly assigning households to receive different voter mobilization messages but measuring the vote turnout of individuals are both cluster randomized experiments: villages and households are the assignment units and individuals are the outcome units. A study which samples villages from the experimental pool, and then randomly assigns some of the people in each village to a treatment is not a cluster randomized experiment: in this study, the unit of assignment is the individual. A study which randomly assigned villages to an intervention and then measures village level responses is also not a cluster randomized study: it is a village level study, the units of assignment and outcome are the same.
Cluster randomization is not block randomization. If an experimenter believes that outcomes carry variation from pre-treatment covariates *and/or* that treatment effects will differ in substantively important ways between subgroups, she may divide the experimental pool into groups that are homogeneous on those covariates (like gender or village size) and randomize within those blocks --- effectively turning one large experiment into multiple mini-experiments. For example, one can first collect clusters of individuals (schools, classrooms, villages, households) into blocks and then have a blocked cluster randomized experiment in order to learn about subgroup differences in treatment effects and also to increase the precision of statistical tests. In this guide, we ignore blocking in order to focus on cluster-lever or group-level treatment assignment with individual-level measurement.
Researchers commonly randomize at the cluster level either because they care about village/school/household level interventions in and of themselves *or* because reaching the individuals within those clusters is too costly and difficult.
# Why Cluster Randomization can cause problems
Cluster randomized experiments have at least two units of analysis: We commonly see few large assignment units ($J$), each containing some outcome units ($n_j$) and thus the total sample size depends on both assignment and outcome units $N=\sum_{j=1}^{J} n_j$.
Cluster randomized experiments raise two new questions for analysts. The first
question is about weighting, or how to combine information from different parts
of the same experiment into one quantity. If clusters are not all the
same size (i.e. $n_j \ne n_k$ for $j \ne k$) then an average treatment effect
must be *defined* in a weighted fashion and estimation should also involve
weights. What weights should one use? On what basis should one choose weights?
One component of weights should account for the size of the cluster (larger
clusters tell us more about the treatment effect than smaller clusters ceteris
paribus). Another component would add that homogeneous clusters (where all
villagers behave in the same way in response to treatment) tell us less about
the treatment effect than heterogeneous clusters (where each villager acts as
if she were more or less independent of the others). If the study is blocked,
then an analyst need to choose a block-weighting scheme and cluster-weighting
scheme. @hansen2008 discuss optimal weights for precise testing in blocked
and cluster-randomized designs. @imai2009essential discuss weighting schemes
for estimation in paired and cluster-randomizied designs.
The second question is about information. We commonly summarize the
information content of a study using the total number of participants, $N$. For
example, we tend to imagine that a study with 10 people has less information
about the experimental intervention than a study with 100 people ceteris
paribus. Yet, two studies with $J=10$ villages and $n_j=100$ people per village
may have different information about the treatment effect on individuals if, in
one study, individuals within a village are more or less independent of each
other versus more or less dependent. If, say, all of the individuals in any
village acted exactly the same but different villages showed different
outcomes, then we would have on the order of 10 pieces of information: all of
the information about causal effects in that study would be at the village
level. Alternatively, if the individuals within a village acted more or less
independently of each other, then we would have on the order of 10 $\times$
100=1000 pieces of information. For a given variable, we can formalize the idea
that the highly dependent clusters provide less information than the highly
independent clusters with the intracluster correlation coefficient. For a given
variable, $x$, we can write the intracluster correlation coefficient like so:
$$ \rho \equiv \frac{\text{variance between clusters in } x}{\text{total variance in } x} \equiv \frac{\tau_x^2}{\tau_x^2 + \sigma_x^2} $$
where $\sigma_x^2$ is the variance within clusters and $\tau_x^2$ is the
variance across clusters. For example, @kish65 uses this description of
dependence to define his idea of the "effective N" of a study (in the sample
survey context, where samples may be clustered):
$$\text{effective N}=\frac{N}{1+(n_j -1)\rho}=\frac{Jn}{1+(n-1)\rho}.$$
where the second term follows if all of the clusters are the same size ($n_1= \ldots =n_J \equiv n$).
If 1000 observations arose from 10 clusters with 20 individuals within each
cluster where 50\% of the variation could be attributed to cluster-to-cluster
differences (and not to differences within a cluster), Kish's formula would
suggest that we have the equivalent of about 19 pieces of independent
information not 10 $\times$ 20=200 pieces.
The inflation in the standard errors for estimators of the average treatment
effect depends on $\rho$ as well. In this simple case with 10 clusters all the
same size of 20 and $\rho=.5$, the standard error of the estimator accounting
for $\rho$ is $1+(n-1)\rho=10.5$ times larger than the standard error that a
simple $t$-test from a linear regression would provide: if
$\text{Var}(\hat{\bar{y}}_{Z_{ij}=1})=s^2/n$ then accounting for clustered
assignment with same size clusters would give us
$\text{Var}(\hat{\bar{y}}_{Z_{ij}=1})=s^2/(J n) (1-(n-1)\rho)$.^[See the
following pieces for more discussion in general of the problems that arise from
clustered designs in the study of politics @stokboweerr:2002, @stokbowe:2002,
@green2007acr, @arceneaux2009modeling]
The fact that most clustered designs contain less information than observations
can lead to invalid statistical inferences. If, for example, the true standard
error is ten times the estimated standard error, then our confidence intervals
and statistical tests will be wildly invalid --- we will be rejecting the null
of no effects much too often. Without accounting for the design, we will be
mislead by reports that we have ample information to reject a null of no
effects: We will claim that a result is "statistically significant" when it is
not.
# Statistical inference for the average treatment effects in cluster randomized experiments part 1: Design-Based Approaches
## What is a standard error?
How would an estimate of the average treatment effect vary if we repeated the
experiment on the same group of villages and people within villages? The
standard error of an estimate of the average treatment effect is one answer to
this question. Here, for example, we simulate a simple, individual-level
experiment to develop intuition about what a standard error is.^[See [link to
other EGAP Methods manual] for a demonstration that the difference of means in
the observed treatment and control groups is an unbiased estimator of the
average treatment effect itself and what it means to be unbiased.]
```{r simplese, cache=TRUE}
N<-100
tau<-.25
y0<-rnorm(N) ## potential outcomes to control
y1<-y0+tau ## potential outcomes to treatment are a simple function of y0
Zrealized<-sample(rep(c(0,1),N/2)) ## Assign treatment to half
fastmean<-function(x){
## A Fast mean calculator (see the mean.default function for all it does that we do not want to do)
.Internal(mean(x))
}
fastvar<-function(x){
## See var for this. Right now it removes missing values
.Call(stats:::C_cov,x,NULL,5,FALSE)
}
simEstAte<-function(Z,y1,y0){
## A function to re-assign treatment and recalculate the difference of means
Znew<-sample(Z)
Y<-Znew*y1+(1-Znew)*y0
estate<-fastmean(Y[Znew==1])-fastmean(Y[Znew==0])
return(estate)
}
set.seed(12345)
simpleResults<-replicate(100000,simEstAte(Z=Zrealized,y1=y1,y0=y0))
sd(simpleResults) ## The standard error of the estimate of the ATE.
```
Although this preceding standard error is intuitive (it is merely the standard
deviation of the distribution arising from repeating the experiment), more
statistics-savvy readers will recognize closed-form expressions for the
standard error like the following (see @gerber2012field and
@dunning2012natural for easy to read explanations and derivations of the
design-based standard error of the simple estimator of the average treatment
effect). If we write $T$ as the set of all treated units and $C$ as the set of all non treated units, we might write
$$ \widehat{\text{Var}}(\hat{\tau})=s^2(Y_{i,i \in T})/m + s^2(Y_{i,i \in C}/(n-m) $$
where $m$ is the number assigned to treatment and $s^2(x)=(1/n-1) \sum_{i=1}^n (x_i - \bar{x})^2$. Here we compare the results of the simulation to this most common standard error as well as to the "true" version (which requires that we know the potential outcomes so as to calculate their covariance):
```{r analyticse}
## True SE (Dunning Chap 6, Gerber and Green Chap 3 and Freedman, Pisani and Purves A-32) including the covariance between the potential outcomes
V<-var(cbind(y0,y1))
varc<-V[1,1]
vart<-V[2,2]
covtc<-V[1,2]
n<-sum(Zrealized)
m<-N-n
varestATE<-((N-n)/(N-1))*(vart/n) + ((N-m)/(N-1))* (varc/m) + (2/(N-1)) * covtc
seEstATE<-sqrt(varestATE)
## And the finite sample *feasible* version (where we do not observe the potential outcomes) and so we do not have the covariance
Yobs<-Zrealized*y1+(1-Zrealized)*y0
varYc<-var(Yobs[Zrealized==0])
varYt<-var(Yobs[Zrealized==1])
fvarestATE<-(N/(N-1)) * ( (varYt/n) + (varYc/m) )
estSEEstATE<-sqrt(fvarestATE)
## Here we use the HC2 standard error --- which Lin 2013 shows is the randomization justied SE for OLS.
library(sandwich)
library(lmtest)
lm1<-lm(Yobs~Zrealized)
## Other SEs
iidSE<- sqrt(diag(vcov(lm1)))[["Zrealized"]]
## Worth noting that if we had covariates in the model we would want this one (which is identical to the previous one without covariates).
NeymanSE<-sqrt(diag(vcovHC(lm1,type="HC2")))[["Zrealized"]]
c(simSE=sd(simpleResults),feasibleSE=estSEEstATE, trueSE=seEstATE, olsIIDSE=iidSE, NeymanDesignSE=NeymanSE)
```
We see that the feasible (also known as the conservative SE) and the true SE
are the same to 3 digits here, where as the OLS versions are a bit smaller and
the simulated standard error is also very close. These standard errors will
diverge when covariates are introduced into the linear model. And, of course,
the true version is rarely calculable since we rarely have access to the true
potential outcomes.
## Standard Errors reflecting cluster-assignment of treatment
To begin, we will create a function which simulates a cluster randomized
experiment with fixed intracluster correlation.^[Code available on the github
repository shows that the ICC will increase as an additive cluster-level
treatment effect increases. See @mathieu2012understanding and
@mathieu2012understandingErr for some code that inspired the code we use here.]
```{r}
ClusteredData<-function(J,n,tau,rho){
## Inspired by Mathieu et al, 2012, Journal of Applied Psychology
if (J %% 2 != 0 | n %% 2 !=0) {
stop(paste("Number of clusters (J) and size of clusters (n) must be even."))
}
## If we do not inflate the variation in the baseline potential outcome (y0)
## by a function of tau (cluster level additive effect), then as tau increases,
## the ICC will necessarily increase.
#y0j<-rnorm(J,0,sd=sqrt(rho))
y0j<-rnorm(J,0,sd=(1+tau)^2 * sqrt(rho))
dat<-expand.grid(i=1:n,J=1:J)
#dat$y0 <- rnorm(n*J,0,sd=sqrt(1-rho))+y0j[dat$J]
dat$y0 <- rnorm(n*J,0,sd=(1+tau)^2 * sqrt(1-rho))+y0j[dat$J]
dat$y1 <- with(dat,fastmean(y0)+tau+(y0-fastmean(y0))*(2/3)) ## give treated group mean shift of tau but also smaller variance
dat$Zi <- ifelse(dat$J %in% sample(1:J,J/2) == TRUE, 1, 0)
dat$Y <- with(dat, Zi*y1 + (1-Zi)*y0)
return(dat)
}
```
```{r, echo=FALSE,eval=FALSE}
ClusteredData2<- function(J,n,tau,rho){
## In this example, the ICC for Y increases as tau increases even if the ICC for y0 is constant
## y0 is potential outcome in absence of treatment. It will have some cluster dependence
dat<-data.frame(i=1:(J*n),J=rep(1:J,n))
dat$y0 <- sqrt(rho/(1-rho))*rnorm(J)[dat$J] + rnorm(J*n,sd=sqrt(1-rho))
dat$y1<-with(dat,mean(y0)+tau+(y0-mean(y0))*(2/3)) ## give treated smaller variance
dat$Zi <- ifelse(dat$J %in% sample(1:J,J/2) == TRUE, 1, 0)
dat$Y <- with(dat, Zi*y1 + (1-Zi)*y0)
return(dat)
}
```
```{r checkrho, eval=FALSE, echo=FALSE}
library(ICC)
library(parallel)
source("code/mydeff.r")
checkICCs<-function(J,n,rho,tau){
dat2<-ClusteredData2(J=J,n=n,tau=tau,rho=rho)
dat<-ClusteredData(J=J,n=n,tau=tau,rho=rho)
dat$JF<-factor(dat$J)
dat2$JF<-factor(dat2$J)
return(c(datICC=ICCbare(y=Y,x=JF,data=dat)[[1]]-rho,
dat2ICC=ICCbare(y=Y,x=JF,data=dat2)[[1]]-rho,
mydeff=mydeff(cluster=dat$JF,y=dat$Y)[[3]]-rho,
mydeff2=mydeff(cluster=dat2$JF,y=dat2$Y)[[3]]-rho
)
)
}
checkICCs(100,10,.1,tau=10)
checkICCs(100,10,.1,tau=.1)
thetaus<-seq(0,10,length=50)
blah<-mclapply(thetaus,function(atau){
replicate(10,checkICCs(100,10,rho=.1,tau=atau))
},mc.cores=detectCores())
names(blah)<-thetaus
tmp<-simplify2array(blah)
plot(range(thetaus),range(as.vector(tmp)),type="n")
points(thetaus,apply(tmp["datICC",,],2,median),col="red")
points(thetaus,apply(tmp["dat2ICC",,],2,median),col="blue")
points(thetaus,apply(tmp["mydeff",,],2,median),col="red")
points(thetaus,apply(tmp["mydeff2",,],2,median),col="blue")
```
Simulate some data from a simple cluster-randomized design for other
demonstrations.
```{r}
set.seed(12345)
pretendDat<-ClusteredData(J=100,n=10,tau=.25,rho=.1)
```
## Cluster-Level Analysis
When the assignment units are clusters and the analysis units are individuals,
the standard error refers to repeated reassignments of treatment to clusters.
One way to avoid the problem of changing the way that standard errors are
calculated is to analyze the data at the level of the cluster --- that
is, to take averages or sums of the outcomes within the clusters, and then to
treat the study as a single-level study (with weights if the clusters vary in
size) (See @dunning2012natural for an argument in favor of this approach).
@hansen2008 also recommend a variant of this approach which we demonstrate
here. They write a test statistic as a cluster-weighted difference of means:
$d(\bz,\by) =\bz^{T}\by/\bz^{T} \bm - (1-\bz)^{T} \by / (1-\bz)^{T} \bm$ where
$\bm$ is a $J \times 1$ vector recording the size of each cluster, and $\bz$
and $\by$ are also both $J \times 1$ vectors recording the treatment assignment
and observed outcome of each cluster respectively. They then show that one can
write a difference of means as a shifted sum, $d(\bz,\by) =\bz^{T}\by *(n/(m_0
n_t (n - n_t))) - \mathbf{1}^{T} \by /( m_0 (n - n_t))$ where $m_0$ is the size
of a cluster, $n$ is the size of the experimental pool of clusters, and $n_t$
is the number of treated clusters (using their notation to make study of that
paper easier) and thus that we can characterize the distribution of the
difference of means using what we know about the distribution of the only
random piece of this expression, the sum of the outcome in the treatment group
--- $\bz^{T}\by$. We demonstrate this approach here:
```{r}
## Make a data frame at the cluster level
clusterDat<-data.frame(Yj=tapply(pretendDat$Y,pretendDat$J,sum),
Zj=tapply(pretendDat$Zi,pretendDat$J,unique))
row.names(clusterDat)<-attr(clusterDat$Zj,"dimnames")[[1]]
clustersize<-table(pretendDat$J)
clusterDat[names(clustersize),"mj"]<-clustersize
clusterDat$ids<-as.numeric(row.names(clusterDat))
## Three equivalent formulas following Hansen and Bowers 2008 for the difference of means given clustered treatment assignment (and equal sized clusters and no blocking).
dp1<-function(x,z,m){
crossprod(z,x)/crossprod(z,m) - crossprod(1-z,x)/crossprod(1-z,m)
}
dp2<-function(x,z,m0){
## m0 is a scalar cluster size
nt<-sum(z)
nc<-sum(1-z)
k0<-m0*nt
k1<-m0*nc
crossprod(z,x)/k0 - crossprod((1-z),x)/k1
}
dp3<-function(x,z,m0,nt=sum(z),n=length(z)){
## m0 is a scalar cluster size
ones<-rep(1,length(x))
crossprod(z,x)*(n/(m0*nt*(n-nt))) - crossprod(ones,x)/(m0*(n-nt))
}
## Given equal sized clusters and no blocking, this is just the difference of means:
c(dp1= with(clusterDat,dp1(Yj,Zj,mj)),
dp2= with(clusterDat,dp2(Yj,Zj,unique(mj))),
dp3= with(clusterDat,dp3(Yj,Zj,unique(mj))),
meandiff = with(pretendDat,mean(Y[Zi==1])-mean(Y[Zi==0])) )
```
Here we use the @hansen2008 approach to calculate the standard error of the
difference in means as an estimator of the average treatment effect.
```{r}
## Now we use the Hansen and Bowers 2008 approach for the variance of the cluster-level diff of means (recall var(a*x)=a^*var(x)
# Notice that we have the cluster-size in the denominator where, in the simple calc of the design-based variance of the total of Y in the treatment group we do not refer to cluster size
# We are still assuming equal cluster sizes.
## Because the dp3 representation has only one random term crossprod(z,x), we can approximate the design based variance of this test statistic dp() using the variance(crossprod(z,x)*constant)
n<-nrow(clusterDat) ## number of clusters
nt<-sum(clusterDat$Zj) ## fixed number of treatment assigned clusters
## This from Lohr on Variance of sample total.
##Vtot <- with(clusterDat,(n^2) * (1-(nt/n)) * ( var(Yj[Zj==1])/nt ))
##sqrt(Vtot)
## Testing our code
## library(survey)
## des<-svydesign(ids=~ids,data=clusterDat[clusterDat$Zj==1,],fpc=rep(.5,n/2))
##thetot<-svytotal(~Yj,design=des)
##thetot
m0<-unique(clusterDat$mj) ## assuming all clusters same size
Vdp<-(n/(m0*nt*(n-nt)))*(var(clusterDat$Yj)/m0)
sqrt(Vdp)
## We can evaluate the hypothesis of no effects with this Normal approximation:
## First show that our analytics are on target
set.seed(12345)
thedist<-with(clusterDat,replicate(10000,dp3(Yj,sample(Zj),m0)))
sd(thedist)
plot(density(thedist))
curve(dnorm(x,mean=0,sd=sqrt(Vdp)),col="red",add=TRUE)
obsmeandiff<-with(clusterDat,dp3(Yj,Zj,m0))[1,1]
pSim<-2*min( mean(thedist>=obsmeandiff),mean(thedist>=obsmeandiff))
##pH0<-2*min(pnorm(c(-1,1)*obsmeandiff,mean=0,sd=sqrt(Vdp)))
pH0<-2*(1-pnorm(abs(obsmeandiff)/sqrt(Vdp),mean=0))
c(SimulatedP=pSim,HansenBowersP=pH0)
```
And here we combine the preceding into a general use unfunction
for testing the null of no effects in large samples using the difference of
means.
```{r hbtest}
hbtest<-function(x,z,m0,n=length(z),nt=sum(z)){
## Hansen and Bowers 2008 based test for diff of means with cluster level assignment
## assuming same size clusters. See also that article for other caveats about the Normal
## approximation used here.
obsmeandiff<-dp3(x=x,z=z,m0=m0,nt=nt,n=n)[1,1]
## Returns a two tailed p-value for the test of the null of no effects
Vdp<-(n/(m0*nt*(n-nt)))*(fastvar(x)/m0)
## tailp<-pnorm(obsmeandiff/sqrt(Vdp))
## 2*min(c(tailp,1-tailp))
return( 2*(1-pnorm(abs(obsmeandiff)/sqrt(Vdp))) )
}
with(clusterDat,hbtest(x=Yj,z=Zj,m0=m0))
```
## Individual Level Analysis accounting for clustering
Alternatively, one could adjust the IID standard error for clustering. If all
clusters are the same size, $n_1=\ldots=n_J=n$, and the experiment has no blocking, the formula
that summarizes this repetition is:
$$\text{Var}_\text{clustered}(\hat{\tau})=\frac{\sigma^2}{\sum_{j=1}^J \sum_{i=1}^n_j (Z_{ij}-\bar{Z})^2} (1-(n-1)\rho)$$
where $\sigma^2=\sum_{j=1}^J \sum_{i=1}^n_j (Y_{ij} - \bar{Y}_{ij})^2$ (following @arceneaux2009modeling ).
This adjustment is commonly known as the "Robust Clustered Standard Error" or
RCSE. Here we demonstrate how the RCSE relates to the standard error calculated
by (1) ignoring the clustering and (2) repeating the experiment using known potential outcomes.
If we repeat the experiment we can see the variation in the treatment effect
calculated at the individual level induced by re-assignment of treatment to
clusters.
```{r rcsesim, cache=TRUE}
rcse <- function(model, cluster){
## clustered standard errors from http://drewdimmery.com/robust-ses-in-r/
##require(sandwich)
##require(lmtest)
M <- length(unique(cluster))
N <- length(cluster)
K <- model$rank
dfc <- (M/(M - 1)) * ((N - 1)/(N - K))
uj <- apply(estfun(model), 2, function(x) tapply(x, cluster, sum));
rcse.cov <- dfc * sandwich(model, meat = crossprod(uj)/N)
rcse.se <- coeftest(model, rcse.cov)
return(list(rcse.cov, rcse.se))
}
## Yet another way to calculate a mean difference:
lm2<-lm(Y~Zi,data=pretendDat)
lm2rcse<-rcse(lm2,pretendDat$J)
iidSE2<- sqrt(diag(vcov(lm2)))[["Zi"]]
rcSE2<-sqrt(diag(lm2rcse[[1]]))[["Zi"]]
## Now benchmark these analytic results against what we would see if we knew the potential outcomes and could repeat the design.
simEstAteClustered<-function(J,y1,y0){
## Equal sized clusters and equal numbers of treated and control clusters
z.sim <- sample(1:max(J), max(J)/2)
Znew <- ifelse(J %in% z.sim == TRUE, 1, 0)
Y<-Znew*y1+(1-Znew)*y0
estate<-fastmean(Y[Znew==1])-fastmean(Y[Znew==0])
return(estate)
}
set.seed(12345)
clusterResults<-replicate(10000,
simEstAteClustered(J=pretendDat$J,
y1=pretendDat$y1,
y0=pretendDat$y0))
c(simClusterSE=sd(clusterResults),
rcSE=rcSE2,
iidSE2=iidSE2)
```
When we ignore the clustered-assignment, the standard error is small. The RCSE and the directly simulated version agree, within simulation error, to three digits. Now, one question is whether the RCSE is a good approximation directly repeating the experiment if the experiment is small. (We set aside questions of whether the coverage of confidence intervals for the next section.)
Here we show that the RCSE is sensitive to the number of clusters.
```{r compareclusters,cache=TRUE}
## And now showing that the simulated and rcse results become the same
## once the number of clusters is large (revealing that the rcse is consistent in the number of clusters).
## require(ICC)
compareClusterSEs<-function(J,n,tau){
set.seed(12345)
pretendDat3<-ClusteredData(J,n,tau,.1)
##theicc<-ICCbare(y=Y,x=J,data=pretendDat3)
set.seed(12345)
clusterResults3<-replicate(10000,
simEstAteClustered(J=pretendDat3$J,
y1=pretendDat3$y1,
y0=pretendDat3$y0))
lm3<-lm(Y~Zi,data=pretendDat3)
lm3rcse<-rcse(lm3,pretendDat3$J)
iidSE3<- sqrt(diag(vcov(lm3)))[["Zi"]]
rcSE3<-sqrt(diag(lm3rcse[[1]]))[["Zi"]]
return(c(##ICC=theicc[[1]],
simClusterSE=sd(clusterResults3),
rcSE=rcSE3,
iidSE2=iidSE3))
}
compareClusterSEs(4,10,.25)
compareClusterSEs(30,10,.25)
compareClusterSEs(100,10,.25)
compareClusterSEs(1000,10,.25)
```
Standard errors accounting for clustering (the ``rcSE``) and the simulation
based standard error (``simClusterSE``) come to resemble one another more and
more as the number of clusters increases. Meanwhile, the standard error
ignoring clustering tends to be smaller than either of the other standard
errors. This raises the question about when tests and confidence intervals
based on these standard errors will perform well and when they will perform
poorly: the fact that some standard errors are smaller than others does not
tell us that we should avoid the small ones. Thus, in the next section we
assess the false positive rate of tests based on these standard errors.
Notice that by holding cluster size fixed here we can assume that the size of
the cluster is unrelated to the potential outcomes. If cluster sizes do vary
then @middleton2008bias and @middleton2011unbiased teach us about how to adjust
standard errors and estimates of the ATE to avoid bias. See also @small2008rig
and @hansenbowers2009att for flexible approaches to design-based tests of
causal effects in experiments with cluster-assignment and non-compliance,
unequal sized clusters, and covariates used to increase precision.
## Design-based performance of different methods of accounting for clustered assignment.
Here we show that the false positive rate of tests based on
RCSE standard errors tends to be incorrect when the number of clusters is
small. The individual-level plus cluster-correction approach becomes valid only
when the number of clusters is large.^[A valid test is one with an error rate
less than or equal to the declared level of the test -- here pegged at
$\alpha=.05$. Valid tests may be overly conservative, but not overly liberal.]
```{r errorrates, cache=TRUE}
confint.HC<-function (b, df, level = 0.95, thevcov, ...) {
## CI for lm with custom vcov
## a stripped down copy of the confint.lm function adding "thevcov" argument
a <- (1 - level)/2
a <- c(a, 1 - a)
fac <- qt(a, df)
ses <- sqrt(diag(thevcov))
b + ses %o% fac
}
simAteZeroClustered<-function(J,Y){
## Make the true relationship equal zero by shuffling Z but not revealing new potential outcomes
z.sim <- sample(1:max(J), max(J)/2)
Znew <- ifelse(J %in% z.sim == TRUE, 1, 0)
lm1<-lm(Y~Znew)
lm1rcse<-rcse(lm1,J)
lm1ci<-confint.HC(b=coef(lm1), thevcov=lm1rcse[[1]], lm1$df.residual)["Znew",]
### is zero in the CI?
zeroInCIlm<- 0 >= lm1ci[1] & 0 <= lm1ci[2]
## Make cluster level data following Hansen and Bowers 2008
Yj<-tapply(Y,J,sum)
Zj<-tapply(Znew,J,fastmean)
m0<-unique(table(J))
## Do hbtest
zeroNotRej<-hbtest(x=Yj,z=Zj,m0=m0)>=.05
return(c(estate=coef(lm1)["Znew"],zeroInCIlm=zeroInCIlm,zeroNotRej=zeroNotRej))
}
set.seed(12345)
pretendDatJ4<-ClusteredData(4,10,.25,.1)
set.seed(12345)
pretendDatJ10<-ClusteredData(10,10,.25,.1)
set.seed(12345)
pretendDatJ30<-ClusteredData(30,10,.25,.1)
set.seed(12345)
pretendDatJ100<-ClusteredData(100,10,.25,.1)
J4res<-replicate(10000, simAteZeroClustered(J=pretendDatJ4$J, Y=pretendDatJ4$Y))
J4error<-apply(J4res,1,mean)[2:3]
J10res<-replicate(10000, simAteZeroClustered(J=pretendDatJ10$J, Y=pretendDatJ10$Y))
J10error<-apply(J10res,1,mean)[2:3]
J30res<-replicate(10000, simAteZeroClustered(J=pretendDatJ30$J, Y=pretendDatJ30$Y))
J30error<-apply(J30res,1,mean)[2:3]
J100res<-replicate(10000, simAteZeroClustered(J=pretendDatJ100$J, Y=pretendDatJ100$Y))
J100error<-apply(J100res,1,mean)[2:3]
```
```{r}
desmat<-rbind(J4error,
J10error,
J30error,
J100error)
colnames(desmat)<-c("OLS+RCSE","Cluster-level")
print(desmat)
```
When the number of clusters is very small ($J=4$) the cluster-level approach is
conservative but the individual-level approach is overly liberal: the 95\% CI
would exclude the truth 100-66=34% of the time rather than only 100-95=5% of
the time (or 0% of the time in the case of the cluster-level approach). As the
number of clusters increases, the performance of both design-based statistical
inference procedures improves although the RCSE approach continues to be
liberal compared to the cluster-level approach.
# Statistical inference for the average treatment effect in cluster randomized experiments part 2: Model-Based Approaches
## Why call an approach Model-Based versus Design-Based?
We have shown two approaches to statistical inference about the average
treatment effect which require that (1) that treatment was randomized as
planned and (2) that the treatment assigned to one unit did not change the
potential outcomes for any other unit. To clarify concepts and code we also
assumed no blocking, no covariates (which could be used to increase precision),
continuous outcomes, no missing outcomes, and equal sized clusters. We also
focused on the "intent to treat" effect. The design-based approach can be
extended to handle more complex designs.^[For example, @hansenbowers2009att
analyze a cluster-randomized field experiment with one-sided non-compliance and
a binary outcome using a design-based approach. And @small2008rig show how to
handle more complex hypotheses about effects in cluster-randomized trial with a
small number of clusters, covariance adjustment, and non-compliance.] When
designs are more complicated, deriving the expressions for standard errors can
become difficult. An alternative approach is to directly model both the
outcome (often, when continuous outcomes, as a Normal random variable) *and*
the cluster-to-cluster differences in the outcome in the control group (often,
as a Normal random variable, too) using a multilevel model like the following.
We can write this model in matrix form as follows with $\bY$ the vector of
observed outcomes which depends on individual level covariates in the matrix,
$\bX$, and a variance-covariance matrix representing relationships among
individuals within clusters of $\bSigma_{Y}$, and another model of the
cluster-to-cluster variation in the coefficients $\bbeta$ (which, here, depends
on cluster-level variables and another variance-covariance matrix describing
relations among clusters). @raudenbush:1997 and @raudenbush2000spa,
@hong2006evaluating and @raudenbush2007strategies all grapple with issues of
design and causal inference under model-based approaches to the study of
cluster-randomized trials in the context of education.
\[
\begin{aligned}
\bY|\bX,\bbeta,\bSigma_Y &\sim N \left( \bX \bbeta, \bSigma_{Y} \right) \label{eq:likeli} \\
\bbeta|\bZ,\bgamma,\bSigma_{\beta} & \sim N \left( \bZ \bgamma, \bSigma_{\beta} \right)
\end{aligned}
\]
We can write a piece of this model in scalar form to show that, in our simple
designs here, $\bX=1$ and show that $\bbeta$ only refers to the
cluster-specific intercept, and $\bgamma$ refers to the overall intercept and
treatment effect estimate.
$$
\begin{alignat}{1}
Y_{ij}=&\beta_{0j}+\varepsilon_{ij}\\
&\beta_{0j}=\gamma_{00}+\gamma_{01} Z_{j}+ \nu_{0j}\\
\end{alignat}
$$
These kinds of Bayesian-inspired models imply a particular kind of weighting
across clusters and blocks which arises as a by product of Bayes
Rule.^[@gelman2007dau for an introduction to multilevel models, including in
the context of statistical inference for counterfactual causal effects, and
@gelman2013bayesian (Chap 8) for an account linking Bayesian approaches to
statistical inference to design and especially randomization.] In particular,
in this Normal-likelihood and Normal-cluster-prior model, the weights take the
following form. Where, as we can see, the distribution of the $\bbeta$ depends
on the different relationships at both the individual and cluster-level.
\[
\begin{aligned}
\by|\bX,\bbeta,\bSigma_Y,\bgamma,\bZ,\bSigma_{\beta} \sim & N \left( \bX \bZ \bgamma, \bSigma_Y + \bX \bSigma_{\beta} \bX^{T} \right)\\
\bbeta|\by,\bZ,\bgamma,\bSigma_{\beta},\bSigma_{\gamma},\bSigma_{Y} \sim N \Biggl( & \begin{aligned}
& (\bX^{T} \bSigma_Y^{-1} \bX +\bSigma^{-1}_{\beta})^{-1}( \bX^{T} \bSigma_Y^{-1} \by + \bSigma_{\beta}^{-1} \bZ \bgamma),\\
& \qquad (\bX^{T} \bSigma_Y^{-1} \bX+\bSigma_{\beta}^{-1})^{-1} \end{aligned} \Biggr)
\end{aligned}
\]
Notice that the flavor of these expressions differs from those we discussed
above. In the design-based approach we referred to repetitions of the
experiment to derive and check expressions for the variance of estimates that
accounted for cluster-level assignment. In the model-based approach we state
that the outcomes were generated according to a probability model (here a
Normal model) and that the cluster-level relationships also follow a
probability model (here, also a Normal model). It is sometimes simpler to state
such models --- even if we do not believe the models as scientific descriptions
of a known process --- and then to assess their characteristics (say, their
error rates and power) than it is to derive new expressions for design-based
estimators when designs are complex. Furthermore, in some situations where
clusters (like schools) are many and are a random sample of a population of
such clusters, the Normal models may describe the outcomes and
cluster-to-cluster variation process well.^[See @barnard2003psa for an example
of an application of this approach when we have other problems such as missing
outcomes, or complex non-compliance.]
## How can we estimate model-based estimates of the ATE?
Here we show that the estimated effect is the same whether we use a simple
difference of means (via OLS) or a multilevel model in our very simplified
cluster randomized trial setup.
```{r modelbased, cache=FALSE}
library(lme4)
lm4<-lm(Y~Zi,data=pretendDatJ30)
lmer1<-lmer(Y~Zi+(1|J),data=pretendDatJ30, control=lmerControl(optimizer='bobyqa'),REML=TRUE)
c(OLS=coef(lm4)["Zi"],Multilevel=fixef(lmer1)["Zi"])
```
The confidence intervals differ even though the estimates as the same --- and
there is more than one way to calculate confidence intervals and hypothesis
tests for multilevel models. The software in R (@Bates:2014aa, @Bates:2014ab) includes three methods by
default and @gelman2007dau recommend MCMC sampling from the implied
posterior. Here we focus on the Wald method only because it is the fastest to
compute. Other methods might show other performance when we evaluate these
methods below.
```{r modelcis, cache=TRUE}
lm4ci<-confint.HC(b=coef(lm4), thevcov=rcse(lm4,pretendDatJ30$J)[[1]], lm4$df.residual)["Zi",]
lmer1ciWald<-lme4:::confint.merMod(lmer1,parm="Zi",method="Wald")["Zi",]
lmer1ciProfile<-lme4:::confint.merMod(lmer1,parm=4,method="profile")["Zi",]
rbind(DesignBasedCI=lm4ci,
ModelBasedWaldCI=lmer1ciWald,
ModelBasedProfileCI=lmer1ciProfile)
```
We can calculate an estimate of the ICC directly from the model quantities (the
variance of the Normal prior that represents the cluster-to-cluster differences
in the intercept over the total variance of the Normal posterior).
```{r}
VC<-as.data.frame(lme4:::VarCorr.merMod(lmer1))
VC$vcov[1]/sum(VC$vcov)
```
## How do model-based approaches perform? Validity
We showed that the large-sample/Normal theory design-based statistical
inference did not produce valid confidence intervals and statistical tests
until at least 30 if not more clusters. Here we compare the design-based
approach to the model-based approach (using the Wald method for calculating
confidence intervals). Although the outcome-model based approach is Bayesian
in structure, we can evaluate its properties across repetitions of the design.
We do this here.
```{r modelci, cache=TRUE}
simAteZeroLmer<-function(J,Y){
## Make the true relationship equal zero by shuffling Z but not revealing new potential outcomes
z.sim <- sample(1:max(J), max(J)/2)
Znew <- ifelse(J %in% z.sim == TRUE, 1, 0)
thelmer<-lmer(Y~Znew+(1|J), control=lmerControl(optimizer='bobyqa'),REML=FALSE)
lmer1ciWald<-lme4:::confint.merMod(thelmer,parm="Znew",method="Wald")
## lmer1ciProfile<-lme4:::confint.merMod(thelmer,parm=4,method="profile")
### is zero in the CI?
zeroInCIWald <- 0 >= lmer1ciWald[1] & 0 <= lmer1ciWald[2]
## zeroInCIProfile <- 0 >= lmer1ciProfile[1] & 0 <= lmer1ciProfile[2]
return(c(estate=fixef(thelmer)["Znew"],
zeroInCIWald=zeroInCIWald #, zeroInCIProfile=zeroInCIProfile
))
}
J4resLmer<-replicate(1000, simAteZeroLmer(J=pretendDatJ4$J, Y=pretendDatJ4$Y))
J4errorMLM<-apply(J4resLmer,1,mean)
J10res<-replicate(1000, simAteZeroLmer(J=pretendDatJ10$J, Y=pretendDatJ10$Y))
J10errorMLM<-apply(J10res,1,mean)
J30res<-replicate(1000, simAteZeroLmer(J=pretendDatJ30$J, Y=pretendDatJ30$Y))
J30errorMLM<-apply(J30res,1,mean)
J100res<-replicate(1000, simAteZeroLmer(J=pretendDatJ100$J, Y=pretendDatJ100$Y))
J100errorMLM<-apply(J100res,1,mean)
```
Here are the error rates for the three approaches to statistical inference in
cluster-randomized experiments as the number of clusters increases in our
simple example study. Recall that valid tests would have error rates within 2
simulation standard errors of .95 --- this would mean that a correct null
hypothesis would be rejected no more than 5% of the time.
```{r compareerrors}
twosimse<-2*sqrt( .05*.95/1000)
print(twosimse)
mat<-cbind("Multilevel"= c(J4errorMLM[2], J10errorMLM[2],
J30errorMLM[2], J100errorMLM[2]),
desmat
)
row.names(mat)<-paste("J=",c(4,10,30,100),sep="")
mat
```
In our simple setup, the individual-level approaches behave about the same way:
neither the design-based nor the model-based approach produces valid
statistical inferences until the number of clusters is at least 30. This makes
sense: both approaches rely on central limit theorems so that a Normal law can
describe the distribution of the test statistic under the null hypothesis. The
cluster-level approach is always valid, but sometimes produces overly large
confidence intervals (when the number of clusters is small). When the number of
clusters is large (say, 100), then all approaches are equivalent in terms of
their error rates. Designs with few clusters should consider either the
cluster-level approach using the normal approximation shown here or even direct
permutation based approaches to statistical inference. Of course, the code in
this guide can be used to assess concerns about test validity at different
numbers of clusters.
# Designing powerful cluster randomized studies: more small clusters are better than fewer larger ones
We want designs which enable statistical tests to strongly discount hypotheses
consistent with the data rarely and reject hypotheses inconsistent with the
data often. Textbooks often say this by referring to the desire to reject false
hypothesis often (a desire for powerful tests) and to not-reject true
hypotheses rarely (a desire for valid tests). We've seen that clustered designs
with few clusters can cause statistical tests to reject data-consistent
hypotheses too frequently. That is, the assumptions required for the validity
of common tests (typically, large numbers of observations, or large quantities
of information in general) are challenged by clustered designs, and the tests
which account for clustering can be invalid if the number of clusters is small
(or information is low at the cluster level in general). We've also seen that
we can produce valid statistical tests for hypotheses about the average
treatment effect using either Robust Clustered Standard Errors (RCSE) or
Multilevel models or using the cluster-level approach described by @hansen2008.
Valid tests are a precondition for thinking about powerful tests. Since we
know how to produce valid statistical inferences, we can then ask whether we
can enhance our ability to detect average treatment effects (to distinguish
them from zero) using design.
The most important rule regarding the statistical power of clustered designs is
that more small clusters are better than fewer larger ones. We will
demonstrate this here. However, for some intuition consider again the formula
for the effective sample size with equal sized clusters and no blocking.
$$\text{effective N}=\frac{Jn}{1+(n-1)\rho}.$$
Note that the size of each cluster $n$ is present in both the numerator and
denominator, the number of clusters $J$ is only present in the numerator. The
size of each cluster, then, will do very little to increase our power unless
the within-cluster dependence ($\rho$) is very small.
Here we demonstrate two approaches to calculating the power of
cluster-randomized research designs: a fast approach based on closed-form
expressions for the power of tests from multilevel models and a slow approach
based on simulation. A closed-form approach based on the expressions from the
RCSE and the cluster-level approaches would also be possible. We do not provide
those expressions here, for now, since the Multilevel approach appears both
more complex in terms of deriving power formulas and roughly equivalent to the
RCSE approach in terms of validity based on our previous simulations presented
above.
## Analytic Model-Based Power Analysis
@raudenbush2000spa and @raudenbush:1997 provide an expression for the power of
a multilevel-model based test that enables quick evaluation of different
multilevel design proposals defined by expected intracluster correlation
($\rho$), the fixed false positive rate ($\alpha$), the effect size (Cohen's
$d$)(which is a standardized measure of the treatment effect that is common in
psychology and educational research --- it is roughly twice the standardized
difference of means or standardized OLS $\beta$), the number of outcome units
($n$), and the number of assignment units ($J$). See @spybrook2011optimal for
detailed explanations of these expressions.
```{r raudpow}
crtpow2<-function(alpha, d, rho, n, J){
# Raudenbush (1997) power function see for details: http://hlmsoft.net/od/od-manual-20111016-v300.pdf
## rho=intraclass correlation [i.e. homogeneity within level-2 units]
## alpha=alpha level (here, power=power to detect an effect size of d at a=.05-->95% confidence level)
## d=effect size [note: d=.2 is like standardized OLS beta=.1, see Snijders and Bosker(1999), page 147 and below]
## n=number of level-1 units per level-2 unit
## J=number of level-2 units
cpow <- 1 - pt(q=qt(1 - alpha, J -2), df=J - 2, ncp=d/(sqrt((4 * (rho + (1 - rho)/ n))/J)))
cpow
}
```
This function uses quantities that we have already discussed. However, the effect size is defined as the difference of means standardized by the sample standard deviation of the outcome. Here we explore how this effect size relates to other quantities in order to provide some intuition to guide use of the formula:
```{r}
## About effect sizes measured as Cohen's d. How do they relate to a standardized mean difference?
## The unstandardized effect is the simple difference of means:
lmtemp<-lm(Y~Zi,data=pretendDat)
coef(lmtemp)["Zi"]
## Simple estATE
with(pretendDat,{ mean(Y[Zi==1]) - mean(Y[Zi==0]) })
## Here are two ways to think of the standardized effect size used in crtpow2()
## Cohen's d is this standardized effect
my.pooled.sd<-function(x,z){
## x is the numeric variable
## z is binary treatment
### the.vars <- tapply(x, z, var, na.rm = TRUE)
### the.pooled.sd <- sqrt(sum(the.vars)/2)
## this next is faster and taken from pairwise.t.test
s <- tapply(x, z, sd, na.rm = TRUE)
nobs <- tapply(!is.na(x), z, sum)
degf <- nobs - 1
total.degf <- sum(degf)
return(sqrt(sum(s^2 * degf)/total.degf))
}
thed<-with(pretendDat,{ ( mean(Y[Zi==1])-mean(Y[Zi==0]) ) / my.pooled.sd(Y,Zi) })
thed
## or following Section 7 in Spybrook et al 2011
with(pretendDat,{ ( mean(Y[Zi==1])-mean(Y[Zi==0]) ) / sd(Y) })
## This standardized effect is different from the standardized regression coefficient in which treatment itself is standardized
## Standardized diff of means (regression beta)
lm.beta<-function(obj,term){
## from QuantPsyc
b <- coef(obj)[term]
sx <- sd(obj$model[[term]])
sy <- sd(model.response(obj$model))
beta <- b * sx/sy
return(beta)
}
thetau<-lm.beta(lmtemp,"Zi")
thetau
## Another version of the standardized regression coef:
coef(lm(scale(Y)~scale(Zi),data=pretendDat))[2]
## So, a d of about .128 is a standardized reg coef of about .064
(thed/thetau)[[1]]
## So to use this function thinking in terms of standardized regression coefficients, one would double the desired coefficient.
## crtpow2(alpha=.05,d=.5,rho=0.3,n=10,J=50)
```
Now we will watch how power changes as we trade $n$ for $J$ and vary $\rho$ for $d=.3 \approx \tau=.15$.
```{r applycrt2pow2,echo=F}
TheJs <- c(3,5,10,15,20,25,30,35,40,45,50)
powerJrho0.01 <- rep(NA,length(TheJs))
for (i in 1:length(TheJs)){