-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Hello, I was attempting to replicate the voting study application in your paper, and I noticed an issue in the parse_data.R file that caused my replicated results not to match the results in the paper.
The issue has to do with the seed not being set in the parse_data.R file before sampling DF.nona:
rm(list = ls())
set.seed(1)
# the full dataset is available from
# https://github.com/gsbDBI/ExperimentData/tree/master/Mobilization/ProcessedData
data = read.csv("mobilization_no_unlisted 2.csv")
# W is intent to treat
# contact is received treatment
covariates = c("persons", "state", "county", "competiv",
"st_sen", "st_hse", "newreg", "vote98",
"vote00", "age", "female")
X = data[,which(names(data) %in% covariates)]
W = data$W
received_treatment = data$contact
Y = data$vote02
DF = data.frame(X, Y, W)
DF.nona = DF[!is.na(rowSums(DF)),]
idx.all = sample(c(sample(which(DF.nona$W == 0), sum(DF.nona$W) * 3/2), which(DF.nona$W == 1)))
DF.subset = DF.nona[idx.all,]
write.csv(DF.subset, "data_clean.csv", row.names = FALSE)Additionally, I found this function useful for directly recreating the cleaned data.
make_data <- function(){
temp_zip <- tempfile()
temp <- tempfile()
download.file("https://github.com/gsbDBI/ExperimentData/raw/master/Mobilization/ProcessedData/mobilization_no_unlisted.zip", temp_zip)
unzip(zipfile = temp_zip, exdir = temp)
data = read.csv(file.path(temp,"mobilization_no_unlisted.csv"))
unlink(c(temp, temp_zip))
covariates = c("persons", "state", "county", "competiv",
"st_sen", "st_hse", "newreg", "vote98",
"vote00", "age", "female")
X = data[,which(names(data) %in% covariates)]
W = data$W
received_treatment = data$contact
Y = data$vote02
DF = data.frame(X, Y, W)
DF.nona = DF[!is.na(rowSums(DF)),]
idx.all = sample(c(sample(which(DF.nona$W == 0), sum(DF.nona$W) * 3/2), which(DF.nona$W == 1)))
DF.subset = DF.nona[idx.all,]
}Using this modified version of parse_data.R, I successfully replicated the results, except for the boosting MSE, but I think that is because of variability in the boosting algorithm?
| Method | Reported MSE | Replicated MSE |
|---|---|---|
| Boosting | 0.00079 | 0.00123 |
| Lasso | 0.00047 | 0.00047 |
| Single Lasso | 0.0006 | 0.00061 |
| BART | 0.00409 | 0.00405 |
| var(tau) | 0.01615 | 0.016 |
Metadata
Metadata
Assignees
Labels
No labels