[R-sig-ME] What does "number of groups < 50"

Mon Oct 28 19:43:58 CET 2013

Dear Prof. Bolker,
thank you very much for your answer.

My bootMer script completed the 1000 iterations calculating confidence  
intervals at the levels 0.95, 0.99 and 0.999 using the methods "norm",  
"basic" and "perc", but gave the following warnings for each fixed  
effect:
Warning : Basic Intervals used Extreme Quantiles
Some basic intervals may be unstable
Warning : Percentile Intervals used Extreme Quantiles
Some percentile intervals may be unstable

Can I then use the intervals calculated by means of the "normal"  
method? Is such method based on an assumption of normality that should  
be tested?
As my response variable is dichotomous, I am not sure what I should  
test for normality and how.
I wonder if in my case the number of groups (86) is sufficient and  
maybe the test for normality is not necessary.

I copy below the bootMer script and the output for one of the fixed  
effects as example.
Kind regards,
Maria Paola Bissiri

Script:
-------

FUN <- function(fallmid.glmer6) {
      return(fixef(fallmid.glmer6))
}

# nsim = number of iterations
fm.glmer6.bM <- bootMer(fallmid.glmer6, FUN, nsim = 1000)

require("boot")

for (n in 1:length(fixef(fallmid.glmer6))){
    print(fixef(fallmid.glmer6)[n])
    (bMCI <- boot.ci(fm.glmer6.bM, conf = c(0.95, 0.99, 0.999),  
index=n, type=c("norm", "basic", "perc")))
    print(bMCI)
}

Example of output:
------------------

langen:ini_pchr
        -1.009678
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL :
boot.ci(boot.out = fm.glmer6.bM, conf = c(0.95, 0.99, 0.999),
      type = c("norm", "basic", "perc"), index = n)

Intervals :
Level      Normal              Basic              Percentile
95%   (-1.986,  0.032 )   (-1.981,  0.088 )   (-2.107, -0.038 )
99%   (-2.303,  0.349 )   (-2.242,  0.376 )   (-2.395,  0.223 )
99.9%   (-2.670,  0.717 )   (-2.471,  0.581 )   (-2.600,  0.451 )
Calculations and Intervals on Original Scale
Warning : Basic Intervals used Extreme Quantiles
Some basic intervals may be unstable
Warning : Percentile Intervals used Extreme Quantiles
Some percentile intervals may be unstable

Zitat von Ben Bolker <bbolker at gmail.com>:

> There is very little established theory giving error bounds for
> different ways of approximating p-values (someone can please pop up and
> correct me if they know of some!  Even better, they can edit
> http://glmm.wikidot.com/faq and add them.).  In general, the ordering is
>
>   parametric bootstrap > likelihood profile > Wald
>
> in terms of computational effort, and the reverse order in terms of
> accuracy/required sample size.
>
>   There are rules of thumb about total number of observations (N), ratio
> of observations to parameters (N/k), residual degrees of freedom (N-k),
> number of groups (n), etc., but I can't give you an authoritative
> reference.  The number of grouping levels is often the most limiting
> factor, so that's what I've quoted.  In my opinion (for whatever it's
> worth), you should probably be OK with LRTs (you appear to have N=1548,
> k=26, n=86, although I might have miscounted; N-k is large (>100), and
> N/k is reasonably large (approx. 60)).  In general, what we mean by
> "large" is that the distribution of the sum of N objects converges
> reasonably well to a Normal (or the scaled distribution of the sum of
> squares converges to a chi^2 distribution); for N>50 this should be
> reasonable unless the underlying distribution is really pathological, or
> you're trying to do tests on the extreme tails of the distributions ...
>
>   The only way to be sure is to calculate PB confidence intervals for
> your model (or for some subset), and compare them with the LRT intervals.
>
>   Ben Bolker
>
> On 13-10-24 03:34 PM, Maria Paola Bissiri wrote:
>> Dear Prof. Bolker,
>> thank you very much for your answer. Yes, in my model the random effects
>> are the experiment participants:
>> (1 + predictor | participant).
>> There are 86 participants.
>>
>> My question is if likelihood ratio tests are reliable for calculating
>> p-values for the parameters of my glmer model (family=binomial).
>> I am trying parametric bootstrapping with bootMer and confint, but those
>> scripts have been running since more than 1700 minutes (is it normal
>> that it takes so long?). So I would prefer to use the LRT method,
>> provided that it gives reliable results.
>>
>> How can I find out if LRT is suitable for my model? Is a number of
>> groups > 50 sufficient?
>> What is meant with "finite-size cases" in this Faq?
>> http://glmm.wikidot.com/faq#toc5
>> For using LRT, are there requirements also regarding the total number of
>> samples and of parameters?
>>
>> Below I copy information about my model and the fitting.
>> I would be thankful for any suggestion you could give me.
>> Kind regards,
>> Maria Paola
>>
>>> fallmid.glmer6
>> Generalized linear mixed model fit by maximum likelihood ['glmerMod']
>>  Family: binomial ( logit )
>> Formula: resp_X ~ lang * ini_pch + lang * manner + lang * fin_B + (1 +
>>      fin_B | subj_ID)
>>    Data: fallmid
>>       AIC       BIC    logLik  deviance
>> 1731.5874 1875.8948 -838.7937 1677.5874
>> Random effects:
>>  Groups  Name        Std.Dev. Corr
>>  subj_ID (Intercept) 1.242
>>          fin_Bm      1.956    -0.72
>> Number of obs: 1548, groups: subj_ID, 86
>> Fixed Effects:
>>     (Intercept)           langde           langen           langsw
>>      ini_pchm         ini_pchr         mannerla         mannerna
>>      fin_Bm
>>        -1.76590          0.71972          0.14369          0.49608
>>       0.25780         -0.02228          0.02569         -0.24087
>>     2.13501
>> langde:ini_pchm  langen:ini_pchm  langsw:ini_pchm  langde:ini_pchr
>> langen:ini_pchr  langsw:ini_pchr  langde:mannerla  langen:mannerla
>> langsw:mannerla
>>        -0.52407          0.11792          0.26505         -0.73270
>>      -1.01174         -0.03017         -0.19740         -0.48532
>>     0.49438
>> langde:mannerna  langen:mannerna  langsw:mannerna    langde:fin_Bm
>> langen:fin_Bm    langsw:fin_Bm
>>         0.41269          0.65082          0.61956         -1.57816
>>      -1.17330         -2.32019
>>
>>> probs.fallmid.glmer6 = 1/(1+exp(-fitted(fallmid.glmer6)))
>>> probs.fallmid.glmer6 = binomial()$linkinv(fitted(fallmid.glmer6))
>>> library(Hmisc)
>>> fit.probs.fallmid.glmer6 = somers2(probs.fallmid.glmer6,
>>> as.numeric(fallmid$resp_X)-1)
>>> fit.probs.fallmid.glmer6
>>            C          Dxy            n      Missing
>>    0.8606880    0.7213759 1548.0000000    0.0000000
>>
>>
>>
>> Zitat von Ben Bolker <bbolker at gmail.com>:
>>
>>> [forwarding to r-sig-mixed models]
>>>
>>> -------- Original Message --------
>>> Subject: What does "number of groups < 50"
>>> Resent-Date: Tue, 22 Oct 2013 07:42:08 -0400
>>> Date: Tue, 22 Oct 2013 11:41:04 +0000
>>> From: Maria Paola
>>> To: bolker at mcmaster.ca
>>>
>>> Dear Prof. Bolker,
>>> I am carrying out an analysis of my data fitting a GLMM with glmer()
>>> (from lme4), family binomial.
>>>
>>> In the chapter "pvalues" in the manual (page 59)
>>> http://cran.r-project.org/web/packages/lme4/lme4.pdf
>>> you recommend the starred (*) methods "when the number of groups is <
>>> 50".
>>>
>>> What is meant exactly with "number of groups"?
>>>
>>> I have in total 1548 observations and four groups of perception
>>> experiment participants: German, Chinese, Swedish and English natives
>>> (language is a predictor in the model), with a maximum of 30
>>> participants per language.
>>>
>>> Using the starred (*) methods for GLMMs means bootstrapping, but I am
>>> experiencing problems with that (e.g. too long calculation time), so I
>>> would prefer to use Likelihood Ratio Tests.
>>> However, I am not sure if this method is suitable. I am not sure if my
>>> data fulfill the criterium of a number of group > 50, since I do not
>>> know what it means.
>>> =============
>>>
>>>       It's not 100% clear since I don't know the structure of your
>>> model exactly (presumably it's something like response ~ [fixed
>>> effects] + (1|participant), or response ~ [fixed effects] +
>>> (?|participant), i.e. you are using participant as a random effect),
>>> but the 'number of groups' is the number of levels of the
>>> random-effects grouping variable, i.e. probably the total number of
>>> participants in your experiment (which sounds like it is probably >
>>> 50). These numbers are reported in the output of glmer, in your case
>>> it would look something like "Number of obs: 1548, groups:
>>> participant, ??
>>

-- 
Dr. Maria Paola Bissiri

TU Dresden
Fakultät Elektrotechnik und Informationstechnik
Institut für Akustik und Sprachkommunikation
01062 Dresden

Barkhausen-Bau, Raum S54
Helmholtzstraße 18

Tel: +49 (0)351 463-34283
Fax: +49 (0)351 463-37781
E-Mail: Maria_Paola.Bissiri at tu-dresden.de
http://wwwpub.zih.tu-dresden.de/~bissiri/index.htm