[R] Bootstrapping the prediction intervals for regression

Bert Gunter gunter.berton at gene.com
Thu May 15 14:43:58 CEST 2014


Please note that this can (and should) be considerably sped up by
taking advantage of the fact that lm() will work on a matrix of
responses. Also, some improvement in speed can usually be obtained by
generating all samples at once rather than generating the sample each
time within a loop.

something like (untested):

nr <- nrow(dat)
ymat <- matrix(with(dat,sample(y,100*nr,rep=TRUE)), nrow =nr)
out <- lm(ymat ~x1+x2,data=dat)
b <- predict(out)

However, why do this anyway, as for linear models the standard closed
form solutions are available?

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, May 15, 2014 at 4:01 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Try to follow the example below and see if you can adapt it to your needs.
> Since you don't provide us with a dataset example, I start by making up
> some.
>
>
> # make up some data
> n <- 22
> set.seed(8873)
> dat <- data.frame(x1 = rnorm(n), x2 = rnorm(n))
> dat$y <- x1 + x2 + rnorm(n)
>
>
> B <- 100  # number of bootstrap samples
> result <- array(dim = c(n, 3, B), dimnames = list(NULL, c("fit", "upr",
> "lwr"), NULL))
> for(i in 1:B){
>         s <- sample(nrow(dat), n, replace = TRUE)
>         lm.tmp <- lm(y ~ x1 + x2, data = dat[s, ])
>         result[,,i] <- predict(lm.tmp, interval = "prediction")
> }
>
>
> Then you can do whatever you want with 'result', including computing the min
> and max values.
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 15-05-2014 10:37, varin sacha escreveu:
>>
>> Dear experts,
>>>
>>>
>>>
>>> I have done a multiple linear regression on a small sample size (n=22).
>>> I have computed the prediction intervals (not the confidence intervals).
>>>
>>>
>>> Now I am trying to bootstrap the prediction intervals.
>>>
>>>
>>> I didn't find any package doing that.
>>> So I decide to create my own R function, but it doesn't work !
>>>
>>>
>>> Here are my R codes :
>>>
>>>
>>> LinearModel.1 <- lm(GDP.per.head ~ Competitivness.score +Quality.score,
>>> data=Dataset)
>>> summary(LinearModel.1)
>>> predict(LinearModel.1, interval = "prediction")
>>>
>>>
>>> HERE IS MY R FUNCTION WHERE I HAVE TRIED TO BOOTSTRAP THE PREDICTION
>>> INTERVALS
>>>
>>>
>>> pred.min<-rep(nrow(Dataset), na.rm=F)
>>> pred.max<-rep(nrow(Dataset), na.rm=F)
>>> for(i in 1:n)
>>> {s<-sample(1:nrow(Dataset),size=22)
>>> reg<-lm(GDP.per.head ~ Competitivness.score +
>>> Quality.score,data=Dataset[s])
>>> pred.min<-pmin(reg,pred.min)
>>> pred.max<-pmax(reg,pred.max)}
>>>
>>>
>>> Thanks for your precious help.
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list