Skip to content

Commit 57de6a1

Browse files
committed
Removing dependence on the ri package and adding ri2 and coin
1 parent 79bd751 commit 57de6a1

File tree

1 file changed

+59
-27
lines changed

1 file changed

+59
-27
lines changed

guides/analysis-procedures/randomization-inference_en.qmd

Lines changed: 59 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "10 Things to Know About Randomization Inference^[We focus here on randomization inference as applied to hypothesis testing. Randomization inference may also be used for construction of confidence intervals, but this application requires stronger assumptions. See @gerber_green_2012, chapter 3.]"
33
author:
4-
- name: "Donald Green^[I am grateful to Winston Lin and Gareth Nellis, who commented on an earlier draft.]"
4+
- name: "Donald Green^[Originating author: Don Green. Thanks to Winston Lin and Gareth Nellis, who commented on an earlier draft. Revisions: Jake Bowers, 8 July 2025. The guide is a live document and subject to updating by EGAP members at any time.]"
55
url: https://egap.org/member/donald-green/
66
image: randomization-inference.png
77
bibliography: randomization-inference.bib
@@ -20,36 +20,68 @@ After we have conducted an experiment, we observe outcomes for the control group
2020

2121
```{r, message = F, warning = F}
2222
# Worked example of randomization inference
23-
rm(list=ls()) # clear objects in memory
24-
library(ri) # load the RI package
23+
library(ri2) # load the RI2 package
24+
library(coin) # load the coin package
25+
2526
set.seed(1234567) # random number seed, so that results are reproducible
2627
# Data are from Table 2-1, Gerber and Green (2012)
27-
Y0 <- c(10, 15, 20, 20, 10, 15, 15)
28-
Y1 <- c(15, 15, 30, 15, 20, 15, 30)
29-
Z <- c(1,0,0,0,0,0,1) # one possible treatment assignment
30-
Y <- Y1*Z + Y0*(1-Z) # observed outcomes given assignment
31-
probs <- genprobexact(Z,blockvar=NULL) # no blocking is assumed when generating probability of treatment and probs are 2/7 for all units
32-
ate <- estate(Y,Z,prob=probs) # estimate the ATE
33-
perms <- genperms(Z,maxiter=10000,blockvar=NULL) # set the number of simulated random assignments
34-
# show all 21 possible random assignments in which 2 units are treated
35-
perms
36-
# --------------------------------------------------------------------
37-
# estimate sampling dist under the sharp null that tau=0 for all units
38-
# --------------------------------------------------------------------
39-
Ys <- genouts(Y,Z,ate=0) # create potential outcomes under the sharp null of no effect for any unit
40-
# show the apparent potential outcomes under the sharp null
41-
Ys
42-
distout <- gendist(Ys,perms,prob=probs) # generate the sampling distribution based on the implied schedule of potential outcomes implied by the null hypothesis
43-
ate # estimated ATE
44-
sort(distout) # list the distribution of possible estimates under the sharp null of no effect
45-
sum( distout >= ate )/nrow(as.matrix(distout)) # one-tailed comparison used to calculate p-value
46-
sum(abs(distout) >= abs(ate))/nrow(as.matrix(distout)) # two-tailed comparison used to calculate p-value
47-
dispdist(distout,ate) # display p-values, 95% confidence interval, standard error under the null, and graph the sampling distribution under the null
28+
dat <- data.frame(
29+
Y0 = c(10, 15, 20, 20, 10, 15, 15),
30+
Y1 = c(15, 15, 30, 15, 20, 15, 30),
31+
Z = c(1,0,0,0,0,0,1)) # one possible treatment assignment
32+
dat$Y <- with(dat,Y1*Z + Y0*(1-Z)) # observed outcomes given assignment
33+
# Represent the design with 2 units assigned to treatment and 5 to control
34+
declaration <- declare_ra(N = 7, m = 2)
35+
print(declaration)
36+
37+
# Notice that there are 21 ways to assign 2 treatments to 7 total units
38+
# the first way is to assign treatments to units 1 and 2, and the second to units 1 and 3, etc.
39+
combn(7,2)
40+
41+
# Conduct Randomization Inference
42+
# using a difference in means test statistic by default
43+
ri2_out <- conduct_ri(
44+
formula = Y ~ Z,
45+
declaration = declaration,
46+
sharp_hypothesis = 0,
47+
data = dat
48+
)
49+
50+
summary(ri2_out)
51+
52+
# The ingredients:
53+
# An observed test statistic
54+
obs_mean_diff <- with(dat,mean(Y[Z==1])-mean(Y[Z==0]))
55+
56+
# The distribution of the test statistic under the null:
57+
table(ri2_out$sims_df$est_sim)
58+
two_tailed_p <- mean(abs(ri2_out$sims_df$est_sim) >= obs_mean_diff)
59+
two_tailed_p
60+
61+
# Another approach using permutations rather than an exact approach using the coin package
62+
# And using a standardized difference of means (like a t-test) rather than the raw
63+
# difference in means used above
64+
dat$ZF <- factor(1-dat$Z) # oneway_test wants treatment to be a factor
65+
t_test_exact <- oneway_test(Y~ZF,data=dat,distribution=exact())
66+
print(t_test_exact)
67+
pvalue(t_test_exact)
68+
## the standardized difference in means
69+
statistic(t_test_exact)
70+
71+
# The equivalent of the table showing the distribution above
72+
# only here using standardized test statistics
73+
rbind(support(t_test_exact),
74+
# The probabilities of each possible test statistic value under the null.
75+
dperm(t_test_exact,x=support(t_test_exact))*21)
76+
4877
# Compare results to traditional t-test with unequal variance
49-
t.test(Y~Z,
78+
79+
# notice that the results are not the same because the t.test is assuming a
80+
# t-distribution for the null distribution of the test statistic.
81+
t.test(Y~Z,data=dat,
5082
alternative = "less",
5183
mu = 0, var.equal = FALSE)
52-
t.test(Y~Z,
84+
t.test(Y~Z, data=dat,
5385
alternative = "two.sided",
5486
mu = 0, var.equal = FALSE)
5587
```
@@ -98,6 +130,6 @@ On the other hand, randomization inference cannot be applied with additional ass
98130

99131
Old-fashioned approximate methods work well when the assumptions on which the approximations rest are sound. For example, when an experiment involves random assignment of individual subjects, outcomes are distributed more or less symmetrically around the mean, and the number of subjects is greater than 100, the difference between conventional p-values and RI p-values may be negligible. Randomization inference may still be useful as the final word, but it rarely changes inferences substantively under these circumstances. The method is valuable primarily for nonstandard applications in which outcomes are skewed, subject pools are small, or the method of assignment is complex.
100132

101-
Note on available software for implementing randomization. For the latest R package for randomization inference, see [here](http://alexandercoppock.com/ri2/articles/ri2_vignette.html). For randomization inference code specifically tailored to the special features of binary outcomes, see [here](https://cran.r-project.org/web/packages/RI2by2/index.html). Stata users may find an all-purpose package [here](https://github.com/simonheb/ritest).
133+
Note on available software for implementing randomization inference. For the latest R package for randomization inference, see [here](http://alexandercoppock.com/ri2/articles/ri2_vignette.html). For randomization inference code specifically tailored to the special features of binary outcomes, see [here](https://cran.r-project.org/web/packages/RI2by2/index.html). Stata users may find an all-purpose package [here](https://github.com/simonheb/ritest).
102134

103135
# References

0 commit comments

Comments
 (0)