[R] Advanced bootstrap question
Michael Ash
mash at econs.umass.edu
Wed Jun 21 17:08:30 CEST 2017
I have an advanced question about bootstrapping.
There are two datasets. In each bootstrap iteration, I would like to
sample
One observation per cluster from the first dataset.
N observations with replacement from the second dataset.
Right now I am using dplyr::sample_n() for first dataset, with this
sampling embedded in the program that boot() from the boot package is
running to sample the second dataset and produce the estimates.
I would prefer to do the entire sampling in the boot() part as opposed to
embedding the sample_n() statement. The reason is so that the "original"
results will indeed be on the full data rather than on a particular sample
from the first dataset.
Any thoughts on how to implement? I think that this involves using strata
and weights to "fool" boot to sample from a concatenation of the two
datasets. The two datasets have entirely different contents (variable and
numbers of observations. MWE follows:
library(boot)
library(car)
library(dplyr)
(first.df <- data.frame(cluster=gl(2,2,4),z=seq(1,2)))
(second.df <- data.frame(y=1:2))
boot_script <- function(X,d) {
zbar <- mean(sample_n(group_by(first.df,cluster),1)$z)
return( c(zbar, zbar * mean(X[d,"y"]) ))
}
## Results based on the original data
(original.zbar <- mean(first.df$z))
mean(original.zbar * second.df[,"y"])
## Bootstrapped results
## Problem: "Original" is itself based on a sampling
for( i in c(1:10)) {
b <- boot(second.df, boot_script, R=100)
print(summary(b))
}
Thank you very much.
--
Michael Ash, Chair, Department of Economics
Professor of Economics and Public Policy
University of Massachusetts Amherst
Email mash at econs.umass.edu
Tel +1-413-545-4815 <(413)%20545-4815> Twitter https://twitter.com/
michaelaoash
[[alternative HTML version deleted]]
More information about the R-help
mailing list