[R] bootstrap syntax + bootstrap for teaching
Tim Hesterberg
timh at insightful.com
Wed Jan 31 20:00:29 CET 2007
Previous subject:
bootstrap bca confidence intervals for large number of statistics in one model; library("boot")
Jacob Wegelin asked for an easier way to do many bootstrap confidence
intervals for regression output.
The syntax would be easier with S+Resample, example below.
You create an ordinary lm object, then do e.g.
boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA)))
limits.bca(boot)
---- RESAMPLING FOR TEACHING ----
I would encourage people to consider using S+Resample for teaching.
There is a free student version of S+, and S+Resample is easier for
students -- easier syntax (in general, not just this example), plus a
menu interface. There are free chapters and packages you can use to
supplement a statistics course with resampling. See:
http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat
I'll give a workshop on resampling for teaching at the USCOTS conference
(U.S. Conference On Teaching Statistics) on May 16.
http://www.causeweb.org/uscots
http://www.causeweb.org/workshop/hesterberg/
set.seed(0)
x <- runif(150)
y <- 2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT <- data.frame(x,y)
NEWDATA <- data.frame(x=seq(min(x), max(x), length=50))
library(resample)
fit <- lm(y ~ x + x^2, data=DAT)
boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA)))
CIs <- limits.bca(boot)
lines(NEWDATA$x, CIs[4:53,1], col="red")
lines(NEWDATA$x, CIs[4:53,4], col="red")
CIs[1:3,]
I used x + x^2 in the model rather than poly(x,2), because poly is
data dependent, so regression coefficients cannot be used for new
data, and confidence intervals for the coefficients are not meaningful.
Comment - the default is 1000 bootstrap samples; that isn't really
enough for BCa intervals, because the BCa calculations magnify the
Monte Carlo standard error by roughly a factor of two.
Jacob Wegelin wrote:
>Sometimes one might like to obtain pointwise bootstrap bias-corrected,
>accelerated (BCA) confidence intervals for a large number of statistics
>computed from a single dataset. For instance, one might like to get
>(so as to plot graphically) bootstrap confidence bands for the fitted
>values in a regression model.
>
>(Example: Chiu S et al., Early Acceleration of Head Circumference in
>Children with Fragile X Syndrome and Autism. Journal of Developmental &
>Behavioral Pediatrics 2007;In press. In this paper we plot the
>estimated trajectories of head circumference for two different
>groups, computed by linear mixed-effects models, with confidence bands
>obtained by bootstrap.)
>
>Below is a toy example of how to do this using the "boot" library.
>We obtain BCA CIs for all three regression parameters and for the fitted
>values at 50 levels of the predictor.
>
>set.seed(1234567)
>x<-runif(150)
>y<-2/3 + pi * x^2 + runif(length(x))/2
>plot(x,y)
>DAT<-data.frame(x,y)
>NEWDATA<-data.frame(x=seq(min(x), max(x), length=50))
>library('boot')
>myfn<-function(data, whichrows) {
> TheseData<-data[whichrows,]
> thisLM<-lm( y~poly(x,2), data=TheseData)
> thisFit<-predict(thisLM, newdata=NEWDATA)
> c(
> coef(summary(thisLM))[,"Estimate"]
> , thisFit)
>}
>bootObj<-boot( data=DAT, statistic=myfn, R=1000 )
>names(bootObj)
>dim(bootObj$t)
>sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] ))
>colnames(sofar)<-c("lo", "hi")
>NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),])
>thecoefs<-sofar[1:3,]
>lines( NEWDATA$x, NEWDATA$lo, col='red')
>lines( NEWDATA$x, NEWDATA$hi, col='red')
>
>Note: To get boot.ci to run with type='bca' it seems necessary to have a
>large value of R. Otherwise boot.ci returns an error, in my (limited)
>experience.
>
>Thanks in advance for any critiques. (For instance, is there an easier or more elegant way?)
Caveat - I helped create S+Resample.
========================================================
| Tim Hesterberg Senior Research Scientist |
| timh at insightful.com Insightful Corp. |
| (206)802-2319 1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. |
| www.insightful.com/Hesterberg |
========================================================
Download S+Resample from www.insightful.com/downloads/libraries
Bootstrap short courses: May 21-22 Philadelphia, Oct 10-11 San Francisco.
2-3 May UK, 3-4 Oct UK.
http://www.insightful.com/services/training.asp
Workshop on resampling for teaching: May 16 Ohio State
http://www.causeweb.org/workshop/hesterberg/
More information about the R-help
mailing list