[R-sig-ME] What does "number of groups < 50"

Ben Bolker bbolker at gmail.com
Thu Oct 24 23:52:25 CEST 2013


  There is very little established theory giving error bounds for
different ways of approximating p-values (someone can please pop up and
correct me if they know of some!  Even better, they can edit
http://glmm.wikidot.com/faq and add them.).  In general, the ordering is

  parametric bootstrap > likelihood profile > Wald

in terms of computational effort, and the reverse order in terms of
accuracy/required sample size.

  There are rules of thumb about total number of observations (N), ratio
of observations to parameters (N/k), residual degrees of freedom (N-k),
number of groups (n), etc., but I can't give you an authoritative
reference.  The number of grouping levels is often the most limiting
factor, so that's what I've quoted.  In my opinion (for whatever it's
worth), you should probably be OK with LRTs (you appear to have N=1548,
k=26, n=86, although I might have miscounted; N-k is large (>100), and
N/k is reasonably large (approx. 60)).  In general, what we mean by
"large" is that the distribution of the sum of N objects converges
reasonably well to a Normal (or the scaled distribution of the sum of
squares converges to a chi^2 distribution); for N>50 this should be
reasonable unless the underlying distribution is really pathological, or
you're trying to do tests on the extreme tails of the distributions ...

  The only way to be sure is to calculate PB confidence intervals for
your model (or for some subset), and compare them with the LRT intervals.

  Ben Bolker

On 13-10-24 03:34 PM, Maria Paola Bissiri wrote:
> Dear Prof. Bolker,
> thank you very much for your answer. Yes, in my model the random effects
> are the experiment participants:
> (1 + predictor | participant).
> There are 86 participants.
> 
> My question is if likelihood ratio tests are reliable for calculating
> p-values for the parameters of my glmer model (family=binomial).
> I am trying parametric bootstrapping with bootMer and confint, but those
> scripts have been running since more than 1700 minutes (is it normal
> that it takes so long?). So I would prefer to use the LRT method,
> provided that it gives reliable results.
> 
> How can I find out if LRT is suitable for my model? Is a number of
> groups > 50 sufficient?
> What is meant with "finite-size cases" in this Faq?
> http://glmm.wikidot.com/faq#toc5
> For using LRT, are there requirements also regarding the total number of
> samples and of parameters?
> 
> Below I copy information about my model and the fitting.
> I would be thankful for any suggestion you could give me.
> Kind regards,
> Maria Paola
> 
>> fallmid.glmer6
> Generalized linear mixed model fit by maximum likelihood ['glmerMod']
>  Family: binomial ( logit )
> Formula: resp_X ~ lang * ini_pch + lang * manner + lang * fin_B + (1 +
>      fin_B | subj_ID)
>    Data: fallmid
>       AIC       BIC    logLik  deviance
> 1731.5874 1875.8948 -838.7937 1677.5874
> Random effects:
>  Groups  Name        Std.Dev. Corr
>  subj_ID (Intercept) 1.242
>          fin_Bm      1.956    -0.72
> Number of obs: 1548, groups: subj_ID, 86
> Fixed Effects:
>     (Intercept)           langde           langen           langsw   
>      ini_pchm         ini_pchr         mannerla         mannerna     
>      fin_Bm
>        -1.76590          0.71972          0.14369          0.49608   
>       0.25780         -0.02228          0.02569         -0.24087     
>     2.13501
> langde:ini_pchm  langen:ini_pchm  langsw:ini_pchm  langde:ini_pchr 
> langen:ini_pchr  langsw:ini_pchr  langde:mannerla  langen:mannerla 
> langsw:mannerla
>        -0.52407          0.11792          0.26505         -0.73270   
>      -1.01174         -0.03017         -0.19740         -0.48532     
>     0.49438
> langde:mannerna  langen:mannerna  langsw:mannerna    langde:fin_Bm   
> langen:fin_Bm    langsw:fin_Bm
>         0.41269          0.65082          0.61956         -1.57816   
>      -1.17330         -2.32019
> 
>> probs.fallmid.glmer6 = 1/(1+exp(-fitted(fallmid.glmer6)))
>> probs.fallmid.glmer6 = binomial()$linkinv(fitted(fallmid.glmer6))
>> library(Hmisc)
>> fit.probs.fallmid.glmer6 = somers2(probs.fallmid.glmer6,
>> as.numeric(fallmid$resp_X)-1)
>> fit.probs.fallmid.glmer6
>            C          Dxy            n      Missing
>    0.8606880    0.7213759 1548.0000000    0.0000000
> 
> 
> 
> Zitat von Ben Bolker <bbolker at gmail.com>:
> 
>> [forwarding to r-sig-mixed models]
>>
>> -------- Original Message --------
>> Subject: What does "number of groups < 50"
>> Resent-Date: Tue, 22 Oct 2013 07:42:08 -0400
>> Date: Tue, 22 Oct 2013 11:41:04 +0000
>> From: Maria Paola
>> To: bolker at mcmaster.ca
>>
>> Dear Prof. Bolker,
>> I am carrying out an analysis of my data fitting a GLMM with glmer()
>> (from lme4), family binomial.
>>
>> In the chapter "pvalues" in the manual (page 59)
>> http://cran.r-project.org/web/packages/lme4/lme4.pdf
>> you recommend the starred (*) methods "when the number of groups is <
>> 50".
>>
>> What is meant exactly with "number of groups"?
>>
>> I have in total 1548 observations and four groups of perception
>> experiment participants: German, Chinese, Swedish and English natives
>> (language is a predictor in the model), with a maximum of 30
>> participants per language.
>>
>> Using the starred (*) methods for GLMMs means bootstrapping, but I am
>> experiencing problems with that (e.g. too long calculation time), so I
>> would prefer to use Likelihood Ratio Tests.
>> However, I am not sure if this method is suitable. I am not sure if my
>> data fulfill the criterium of a number of group > 50, since I do not
>> know what it means.
>> =============
>>
>>       It's not 100% clear since I don't know the structure of your
>> model exactly (presumably it's something like response ~ [fixed
>> effects] + (1|participant), or response ~ [fixed effects] +
>> (?|participant), i.e. you are using participant as a random effect),
>> but the 'number of groups' is the number of levels of the
>> random-effects grouping variable, i.e. probably the total number of
>> participants in your experiment (which sounds like it is probably >
>> 50). These numbers are reported in the output of glmer, in your case
>> it would look something like "Number of obs: 1548, groups:
>> participant, ??
>



More information about the R-sig-mixed-models mailing list