[R-sig-eco] Bootstrapping with pseudo-replicates

Lars Westerberg lawes at ifm.liu.se
Fri Nov 18 09:08:52 CET 2011


I mainly use two ways to collect results in a for-loop. Either by 
defining an empty result variable:
out <- NA  #is ugly but makes things really easy:
out[1] <- 1 #even out[3] <- 1 works

It is better, and faster, to allocate memory for a result variable using 
e.g. 'array' or 'matrix'. In the example below, you have to match the 
number of columns with length of the result vector:
out <- matrix(NA, nrow=R, ncol=4) #define out before for-loop
#for(i in 1:R){
#...
out[i,] <- c(coef(model),... #store results
#...
#}
apply(out,2,mean,na.rm=TRUE) #Calc mean of matrix columns/reg. param.
apply(out,2,sd,na.rm=TRUE) #Calc sd of matrix columns/reg. param.

HTH
/Lars

On 2011-11-17 16:02, Johannes Radinger wrote:
> Hello Dixon,
>
> As there is no real predefined function for doing that resampling and regression what I want I tried to work on my own code. So far I get a code which is working in a for-loop. There is still a problem because I don't know how to collect the results-vector for each loop step into a data frame or list etc.
>
> Maybe someone can help me. So far I got following:
>
> library(plyr)
>
> y<- c(1,5,6,2,5,10) # response
> x<- c(2,12,8,1,16,17) # predictor
> group<- factor(c(1,2,2,3,4,4)) # group
> df<- data.frame(y,x,group)
>
>
>
> R = 50                                      # the number of replicates
> out = numeric(R)                          	# storage for the results
> for (i in 1:R) {
> 	subsample<- ddply(df, .(group), function(x){
> 	x[sample(nrow(x), 1), ]})
> 	model<- lm(y~x,data=subsample)
> 	out[i]<- c(coef(model),				#vector of coefficients
> 	summary(model)$coefficients[-1,4],		#p-values for all except Intercept
> 	pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2],
> 	summary(model)$fstatistic[3], lower.tail = FALSE),		#overall p-value
> 	summary(model)$r.squared)
> 	}
>
>
> The problem is the object out. This must be a dataframe or a list where all the resulting out[i] vectors are collected. I want it in a way so that I can easily calculate the mean/variance of the single regression parameters etc.
>
>
> Thank you very much!
> Johannes
>
> -------- Original-Nachricht --------
>> Datum: Wed, 16 Nov 2011 08:25:11 -0600
>> Von: "Dixon, Philip M [STAT]"<pdixon at iastate.edu>
>> An: "r-sig-ecology at r-project.org"<r-sig-ecology at r-project.org>
>> Betreff: [R-sig-eco] Bootstrapping with pseudo-replicates
>
>> Johannes,
>>
>> A very good question to ask, but you can't use a bootstrap, or boot(), to
>> investigate it.
>>
>> You can define strata and then bootstrap observations within strata, but
>> all bootstrap data sets will have the same structure as the original data.
>> That's the point of the bootstrap.  In your example, you have observations
>> from 4 sites, 1 obs from site 1, 2 from site 2, 1 from site 3, and 2 from
>> site 4.  Every stratified bootstrap sample will have 1 from site 1, 2 from
>> site 2, 1 from site 3 and 2 from site 4.
>>
>> I believe you have to construct your own code, probably along the lines of
>> defining a vector for one obs per site, then for each site: extracting the
>> set of pseudoreplicates for one site, using sample() to grab one value
>> from that set, then storing in the  vector.
>>
>> Best wishes,
>> Philip Dixon
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> --
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list