[R-sig-ME] bootstrapping coefficient p-values under null hypothesis in case-resampling multiple linear mixed-effects regression
Aleksander Główka
aglowka at stanford.edu
Sat Jan 27 23:19:35 CET 2018
Dear mixed-effects community,
I am fitting a multiple linear mixed-effects regression model in lme4.
The residual fit is near-linear, enough to warrant not assuming residual
homoscedasticity. One way to model regression without explicitly making
this assumption is to use case-resampling regression (Davison & Hinkley
1997), an application of the bootstrap (Efron & Tibshirani 1993).
In case-resampling regression, rather than assuming a normal
distribution for the T-statistic, we estimate the distribution of T
empirically. We mimic sampling from the original population by treating
the original sample as if it were the population: for each bootstrap
sample of size n we randomly select n values with replacement from the
original sample and then fit regression giving estimates, repeating this
procedure R times.
Having applied this procedure, I am trying to calculate empirical
p-values for my regression coefficients. As in parametric regression, I
want to conduct the two-tailed hypothesis test of significance for slope
with test statistic T under the null hypothesis H0:β^1=0. Since we are
treating the original sample as the population, our T=t is the observed
value from the original sample. For β^{0,1,…,p} We calculate the p-value
as follows:
(1) min(p=(1{T≥t}/R),p=(1{T≤t})/R)
Davison and Hinkley take t=β^1
so that, in practice
(2) min(p=(1{β∗^1≥β^1}+1)/(R+1),p=(1{β∗^1≤β^1}+1)/(R+1))
The major problem here is that the bootstrap samples were not sampled
under the null hypothesis, so in (1) and (2) we are evaluating the
alternative hypothesis rather than the null. Efron & Tibshirani (1993)
indeed caution that all hypothesis testing must be performed by sampling
under the null. This is relatively simple for, say, testing the
difference between two means, where the null H0:σ1=σ2, and which
requires a simple transformation of the data prior to sampling.
So my question here is: how do I perform significance testing under the
null hypothesis in case-resampling regression? As far as I could see,
neither Davison & Hinkley (1997) nor Efron & Tibshirani (1993) seem to
mention how to sample under the null. Is there some adjustment that I
can introduce before (to the data) or after case-resampling (to the
least-squares formula) in a way that is easily implementable in R and
lme4? Any ideas and or algorithms would be greatly appreciated.
N.B. With all due respect, please don’t advise me to fit a GLM instead
or to talk directly with Rob Tibshirani.
Thank you,
Aleksander Glowka
PhD Candidate in Linguistics
Stanford University
Works cited:
Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their
Applications. Cambridge, England: Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap.
New York: Champman & Hall.
More information about the R-sig-mixed-models
mailing list