[R-sig-ME] bootstrapping coefficient p-values under null hypothesis in case-resampling multiple linear mixed-effects regression

Sat Jan 27 23:19:35 CET 2018

Dear mixed-effects community,

I am fitting a multiple linear mixed-effects regression model in lme4. 
The residual fit is near-linear, enough to warrant not assuming residual 
homoscedasticity. One way to model regression without explicitly making 
this assumption is to use case-resampling regression (Davison & Hinkley 
1997), an application of the bootstrap (Efron & Tibshirani 1993).

In case-resampling regression, rather than assuming a normal 
distribution for the T-statistic, we estimate the distribution of T 
empirically. We mimic sampling from the original population by treating 
the original sample as if it were the population: for each bootstrap 
sample of size n we randomly select n values with replacement from the 
original sample and then fit regression giving estimates, repeating this 
procedure R times.

Having applied this procedure, I am trying to calculate empirical 
p-values for my regression coefficients. As in parametric regression, I 
want to conduct the two-tailed hypothesis test of significance for slope 
with test statistic T under the null hypothesis H0:β^1=0. Since we are 
treating the original sample as the population, our T=t is the observed 
value from the original sample. For β^{0,1,…,p} We calculate the p-value 
as follows:

(1) min(p=(1{T≥t}/R),p=(1{T≤t})/R)

Davison and Hinkley take t=β^1

so that, in practice

(2) min(p=(1{β∗^1≥β^1}+1)/(R+1),p=(1{β∗^1≤β^1}+1)/(R+1))

The major problem here is that the bootstrap samples were not sampled 
under the null hypothesis, so in (1) and (2) we are evaluating the 
alternative hypothesis rather than the null. Efron & Tibshirani (1993) 
indeed caution that all hypothesis testing must be performed by sampling 
under the null. This is relatively simple for, say, testing the 
difference between two means, where the null H0:σ1=σ2, and which 
requires a simple transformation of the data prior to sampling.

So my question here is: how do I perform significance testing under the 
null hypothesis in case-resampling regression? As far as I could see, 
neither Davison & Hinkley (1997) nor Efron & Tibshirani (1993) seem to 
mention how to sample under the null. Is there some adjustment that I 
can introduce before (to the data) or after case-resampling (to the 
least-squares formula) in a way that is easily implementable in R and 
lme4? Any ideas and or algorithms would be greatly appreciated.

N.B. With all due respect, please don’t advise me to fit a GLM instead 
or to talk directly with Rob Tibshirani.

Thank you,

Aleksander Glowka
PhD Candidate in Linguistics
Stanford University

Works cited:

Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their 
Applications. Cambridge, England: Cambridge University Press.

Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. 
New York: Champman & Hall.