[R] bootstrapping in regression

Thu Jan 29 22:56:52 CET 2009

Tom Backer Johnsen wrote:
> Thomas Mang wrote:
>> Hi,
>>
>> Please apologize if my questions sounds somewhat 'stupid' to the 
>> trained and experienced statisticians of you. Also I am not sure if I 
>> used all terms correctly, if not then corrections are welcome.
>>
>> I have asked myself the following question regarding bootstrapping in 
>> regression:
>> Say for whatever reason one does not want to take the p-values for 
>> regression coefficients from the established test statistics 
>> distributions (t-distr for individual coefficients, F-values for 
>> whole-model-comparisons), but instead apply a more robust approach by 
>> bootstrapping.
>>
>> In the simple linear regression case, one possibility is to randomly 
>> rearrange the X/Y data pairs, estimate the model and take the 
>> beta1-coefficient. Do this many many times, and so derive the null 
>> distribution for beta1. Finally compare beta1 for the observed data 
>> against this null-distribution.
> 
> There is a very basic difference between bootstrapping and random 
> permutations.  What you are suggesting is to shuffle values between 
> cases or rows in the frame.  That amounts to a variant of a permutation 
> test, not a bootstrap.
> 
> What you do in a bootstrap test is different, you regard your sample as 
> a population and then sample from that population (with replacement), 
> normally by extracting a large number of random samples of the same size 
> as the original sample and do the computations for whatever you are 
> interested in for each sample.
> 
> In other words, with bootstrapping, the pattern of values within each 
> case or row is unchanged, and you sample complete cases or rows.  With a 
> permutation test you keep the original sample of cases or rows, but 
> shuffle the observations on the same variable between cases or rows.
> 
> Have a look at the 'boot' package.
> 
> Tom
>>
>> What I now wonder is how the situation looks like in the multiple 
>> regression case. Assume there are two predictors, X1 and X2. Is it 
>> then possible to do the same, but just only rearranging the values of 
>> one predictor (the one of interest) at a time? Say I want again to 
>> test beta1. Is it then valid to many times randomly rearrange the X1 
>> data (and keeping Y and X2 as observed), fit the model, take the beta1 
>> coefficient, and finally compare the beta1 of the observed data 
>> against the distributions of these beta1s ?
>> For X2, do the same, randomly rearrange X2 all the time while keeping 
>> Y and X1 as observed etc.
>> Is this valid ?
>>
>> Second, if this is valid for the 'normal', fixed-effects only 
>> regression, is it also valid to derive null distributions for the 
>> regression coefficients of the fixed effects in a mixed model this 
>> way? Or does the quite different parameters estimation calculation 
>> forbid this approach (Forbid in the sense of bogus outcome) ?
>>
>> Thanks, Thomas
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 


-- 
+----------------------------------------------------------------+
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185                        Fax : +47-5558-9879 |
| Email : backer at psych.uib.no    URL : http://www.galton.uib.no/ |
+----------------------------------------------------------------+