[R-sig-ME] bootstrapping coefficient p-values under null hypothesis in case-resampling multiple linear mixed-effects regression

Sun Jan 28 22:12:27 CET 2018

Ben and Robert, thank you for your suggestions! I was not aware of the 
sampling variables approach, but it seems like a very reasonable way to 
proceed. It seems quite similar to random forest and boosting.

In my search I found another strategy, outlined in Godfrey (2009): for 
(multiple) regression, the null hypothesis is that there is no 
relationship between the predictors and the response. This is the case 
when all of the β coefficients are zero, which leads to the response y 
also being 0! So in practice, sampling under the null hypothesis can be 
implemented by centering all y in the original data around 0 and then 
resampling cases.

Source: Godfrey, Leslie (2009): /Bootstrap Tests for Regression Models/. 
New York: Palgrave Macmillan.

Thanks,

Aleksander Główka
PhD Candidate in Linguistics
Stanford University

On 1/27/2018 7:20 PM, Robert LaBudde wrote:
> If your null hypothesis is that variable X has a coefficient of zero 
> in the model, would not sampling under the null hypothesis be done by 
> case resampling of every variable except X, and then resample X from 
> its set of values?
>
> It would appear wise to just do case resampling and construct a 
> confidence interval for the coefficient from the bootstrap. I avoid 
> statistical testing as completely as possible.
>
>
> On 1/27/2018 5:19 PM, Aleksander Główka wrote:
>> Dear mixed-effects community,
>>
>> I am fitting a multiple linear mixed-effects regression model in 
>> lme4. The residual fit is near-linear, enough to warrant not assuming 
>> residual homoscedasticity. One way to model regression without 
>> explicitly making this assumption is to use case-resampling 
>> regression (Davison & Hinkley 1997), an application of the bootstrap 
>> (Efron & Tibshirani 1993).
>>
>> In case-resampling regression, rather than assuming a normal 
>> distribution for the T-statistic, we estimate the distribution of T 
>> empirically. We mimic sampling from the original population by 
>> treating the original sample as if it were the population: for each 
>> bootstrap sample of size n we randomly select n values with 
>> replacement from the original sample and then fit regression giving 
>> estimates, repeating this procedure R times.
>>
>> Having applied this procedure, I am trying to calculate empirical 
>> p-values for my regression coefficients. As in parametric regression, 
>> I want to conduct the two-tailed hypothesis test of significance for 
>> slope with test statistic T under the null hypothesis H0:β^1=0. Since 
>> we are treating the original sample as the population, our T=t is the 
>> observed value from the original sample. For β^{0,1,…,p} We calculate 
>> the p-value as follows:
>>
>> (1) min(p=(1{T≥t}/R),p=(1{T≤t})/R)
>>
>> Davison and Hinkley take t=β^1
>>
>> so that, in practice
>>
>> (2) min(p=(1{β∗^1≥β^1}+1)/(R+1),p=(1{β∗^1≤β^1}+1)/(R+1))
>>
>> The major problem here is that the bootstrap samples were not sampled 
>> under the null hypothesis, so in (1) and (2) we are evaluating the 
>> alternative hypothesis rather than the null. Efron & Tibshirani 
>> (1993) indeed caution that all hypothesis testing must be performed 
>> by sampling under the null. This is relatively simple for, say, 
>> testing the difference between two means, where the null H0:σ1=σ2, 
>> and which requires a simple transformation of the data prior to 
>> sampling.
>>
>> So my question here is: how do I perform significance testing under 
>> the null hypothesis in case-resampling regression? As far as I could 
>> see, neither Davison & Hinkley (1997) nor Efron & Tibshirani (1993) 
>> seem to mention how to sample under the null. Is there some 
>> adjustment that I can introduce before (to the data) or after 
>> case-resampling (to the least-squares formula) in a way that is 
>> easily implementable in R and lme4? Any ideas and or algorithms would 
>> be greatly appreciated.
>>
>> N.B. With all due respect, please don’t advise me to fit a GLM 
>> instead or to talk directly with Rob Tibshirani.
>>
>> Thank you,
>>
>> Aleksander Glowka
>> PhD Candidate in Linguistics
>> Stanford University
>>
>> Works cited:
>>
>> Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their 
>> Applications. Cambridge, England: Cambridge University Press.
>>
>> Efron, B. and Tibshirani, R.J. (1993). An Introduction to the 
>> Bootstrap. New York: Champman & Hall.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]