[R-sig-ME] Too small a sample size for lmer?

Martin Maechler maechler at stat.math.ethz.ch
Sat Jul 18 16:59:00 CEST 2009


>>>>> "CG" == Christine Griffiths <Christine.Griffiths at bristol.ac.uk>
>>>>>     on Sat, 18 Jul 2009 13:58:36 +0100 writes:

    CG> Dear R users,
    CG> Many of you may be familiar with my design as I have posted a number of 
    CG> queries before. Having consulted with someone in my department about 
    CG> estimating bias corrected confidence intervals for small sample sizes 
    CG> (rather than MCMC which Baayen et al. 2008 suggest should not be used), 
    CG> they implied that I should not be using lmer for such a small sample size 
    CG> as lmer was designed to deal with very large datasets. Is this still the 
    CG> case? If so what is regarded as a small sample size?

The fact that it was designed *to be able* to deal with big data
sets does not mean that it was not appropriate for small data
sets as well.
It's just that mixed effect models with large data sets an
crossed random effects really currently can *only* be
analyzed with lmer {no other software available, not even if you
pay much}.

Said all that, I think your situation looks like a case where I
would want to use (probably a parametric) bootstrap,
and interestingly enough, at the UseR! 2009 meeting in Rennes,
10 days ago, there was a nice talk on this topic:

   Jose A. Sanchez-Espigares, Jordi Ocaña 	 
   An R implementation of bootstrap procedures for mixed models 

You can find the abstract *and* slides on
  http://www.agrocampus-ouest.fr/math/useR-2009/abstracts/user_author.html

I don't think that their R code is already publicly available,
but I've CC'ed one of the authors, and they may be willing to
let you use their code before release.

Martin Maechler, ETH Zurich

    CG> Below is a description of my data. I have 5/6 enclosures (replicates) per 
    CG> treatment - Aldabra/Radiata/control. Aldabra and radiata refer to two 
    CG> different tortoise species, while control lacks tortoises. The enclosures 
    CG> were assigned to a block: a block containing each of the 3 treatments, i.e. 
    CG> 6 blocks in total. Each month for ten months I collected data: a repeated 
    CG> crossed design. Unfortunately, I have non-orthogonal, unbalanced data (5/6 
    CG> enclosures per treatment) as I cannot use a replicate within the aldabra 
    CG> and radiata treatments. These are however from different blocks so I am 
    CG> reluctant to axe them to achieve balanced data as this would leave me only 
    CG> 4 blocks. I measured various attributes which I think that tortoises would 
    CG> have an impact on, e.g. plant count, species richness. Because my data is 
    CG> unbalanced and a repeated measures design I had chosen lmer to best model 
    CG> this.

    CG> For one other aspect, I calculate food web properties, for which I have no 
    CG> replication, i.e. only one observation per treatment per month. Would lmer 
    CG> be an acceptable way to analyse this data?

    CG> If lmer is not advised for the analyses of these data, what other analyses 
    CG> techniques should I investigate?

    CG> Baayen et al. (2008)Mixed-effects modeling with crossed random effects
    CG> for subjects and items. Journal of Memory and Language, 59, 390-412.

    CG> Many thanks,
    CG> Christine




More information about the R-sig-mixed-models mailing list