[R-sig-ME] bootstrapping coefficient p-values under null hypothesis in case-resampling multiple linear mixed-effects regression

Sun Jan 28 04:20:26 CET 2018

If your null hypothesis is that variable X has a coefficient of zero in 
the model, would not sampling under the null hypothesis be done by case 
resampling of every variable except X, and then resample X from its set 
of values?

It would appear wise to just do case resampling and construct a 
confidence interval for the coefficient from the bootstrap. I avoid 
statistical testing as completely as possible.

On 1/27/2018 5:19 PM, Aleksander Główka wrote:
> Dear mixed-effects community,
>
> I am fitting a multiple linear mixed-effects regression model in lme4. 
> The residual fit is near-linear, enough to warrant not assuming 
> residual homoscedasticity. One way to model regression without 
> explicitly making this assumption is to use case-resampling regression 
> (Davison & Hinkley 1997), an application of the bootstrap (Efron & 
> Tibshirani 1993).
>
> In case-resampling regression, rather than assuming a normal 
> distribution for the T-statistic, we estimate the distribution of T 
> empirically. We mimic sampling from the original population by 
> treating the original sample as if it were the population: for each 
> bootstrap sample of size n we randomly select n values with 
> replacement from the original sample and then fit regression giving 
> estimates, repeating this procedure R times.
>
> Having applied this procedure, I am trying to calculate empirical 
> p-values for my regression coefficients. As in parametric regression, 
> I want to conduct the two-tailed hypothesis test of significance for 
> slope with test statistic T under the null hypothesis H0:β^1=0. Since 
> we are treating the original sample as the population, our T=t is the 
> observed value from the original sample. For β^{0,1,…,p} We calculate 
> the p-value as follows:
>
> (1) min(p=(1{T≥t}/R),p=(1{T≤t})/R)
>
> Davison and Hinkley take t=β^1
>
> so that, in practice
>
> (2) min(p=(1{β∗^1≥β^1}+1)/(R+1),p=(1{β∗^1≤β^1}+1)/(R+1))
>
> The major problem here is that the bootstrap samples were not sampled 
> under the null hypothesis, so in (1) and (2) we are evaluating the 
> alternative hypothesis rather than the null. Efron & Tibshirani (1993) 
> indeed caution that all hypothesis testing must be performed by 
> sampling under the null. This is relatively simple for, say, testing 
> the difference between two means, where the null H0:σ1=σ2, and which 
> requires a simple transformation of the data prior to sampling.
>
> So my question here is: how do I perform significance testing under 
> the null hypothesis in case-resampling regression? As far as I could 
> see, neither Davison & Hinkley (1997) nor Efron & Tibshirani (1993) 
> seem to mention how to sample under the null. Is there some adjustment 
> that I can introduce before (to the data) or after case-resampling (to 
> the least-squares formula) in a way that is easily implementable in R 
> and lme4? Any ideas and or algorithms would be greatly appreciated.
>
> N.B. With all due respect, please don’t advise me to fit a GLM instead 
> or to talk directly with Rob Tibshirani.
>
> Thank you,
>
> Aleksander Glowka
> PhD Candidate in Linguistics
> Stanford University
>
> Works cited:
>
> Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their 
> Applications. Cambridge, England: Cambridge University Press.
>
> Efron, B. and Tibshirani, R.J. (1993). An Introduction to the 
> Bootstrap. New York: Champman & Hall.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Robert A LaBudde, BS, MS, PhD, ChDipl ACAFS    President
Least Cost Formulations, Ltd                   URL: lcfltd.com
824 Timberlake Dr                              Tel: 757-467-0954
Virginia Beach, VA 23464                       Fax: 757-467-2947