[R-sig-ME] How many groups is enough?

Highland Statistics Ltd. highstat at highstat.com
Mon Aug 31 15:11:40 CEST 2009


>> One point, is that for most analyses we are not interested in estimates of
>> the random effect variances.
Unless you stick the estimated variances in these nice power analysis 
equations (e.g. in Snijders and Bosker, 1999 if I remember well) that 
calculate the intraclass correlation, and tell you how many observations 
to take next time.

Alain


>>  My impression is that other parameter estimates
>> are fairly robust to the random effects variance, so if the models fit
>> sensibly then it seems a reasonable approach. One problem with a small
>> number of groups may be the use of Empirical Bayes, as it ignores the
>> estimate uncertainty. I assume somebody has written a paper on this,
>> advocating full Bayesian analysis. People seem happy to do random effects
>> meta-analysis with only a few trials.
>>     
>
> I agree that the precision of estimates of the variance components can
> be poor and that this is not that much of a problem when one is
> primarily interested in the estimates of the fixed-effects parameters.
>  (By the way, the statement that "REML estimates are unbiased" is not
> true in general.  Even in the simple, balanced cases where they are
> unbiased, I don't think it is an important property because the
> distribution of the estimator is so skewed that characterizing the
> distribution by its mean is unrealistic.).
>
> I produced some plots of the profiled likelihood of the variance
> components for a simple, balanced example with 6 groups (a model for
> the Dyestuff data).  They are rather sobering although one should
> expect highly skewed patterns for a variance estimate (think of the
> simplest case of the estimate of a variance from the mythical i.i.d.
> Gaussian sample).  The plots are available in
> http://lme4.r-forge.r-project.org/slides/2009-07-21-Seewiesen/4PrecisionD.pdf
>
>   
>> Ken
>>
>>     
>>> So the question remains: "what is a 'small' number of groups?".  I'm
>>> not sure but the following may be suggestive, at least of the symmetry
>>> of the sampling distribution (i.e. chi sq w/ df = # groups - 1):
>>>
>>> ngroups <- c(4, 6, 10, 15, 20)
>>> plot(0, type='n', xlim=c(0, 30), ylim=c(0, .3))
>>> for (i in ngroups) {
>>>  plot(function(x) dchisq(x, i - 1), 0, 60, add=TRUE)
>>> }
>>>
>>>
>>> Also, googling turned up the paper below, which for a sub-class of
>>> mixed models suggests that >=50 groups is sufficient to get
>>> group-level variances and standard errors that are unbiased (but not
>>> necessarily low-variance, AFAICS).
>>>
>>> @article{maas2005sufficient,
>>>  title={{Sufficient sample sizes for multilevel modeling}},
>>>  author={Maas, C.J.M. and Hox, J.J.},
>>>  journal={Methodology},
>>>  volume={1},
>>>  number={3},
>>>  pages={86--92},
>>>  year={2005}
>>>  abstract={An important problem in multilevel modeling is what
>>> constitutes a sufficient sample size for accurate estimation. In
>>> multilevel analysis, the major restriction is often the higher-level
>>> sample size. In this paper, a simulation study is used to determine
>>> the influence of different sample sizes at the group level on the
>>> accuracy of the estimates (regression coefficients and variances)
>>> and their standard errors. In addition, the influence of other
>>> factors, such as the lowest-level sample size and different variance
>>> distributions between the levels (different intraclass correlations),
>>> is examined. The results show that only a small sample size
>>> at level two (meaning a sample of 50 or less) leads to biased
>>> estimates of the second-level standard errors. In all of the other
>>> simulated conditions the estimates of the regression coefficients, the
>>> variance components, and the standard errors are unbiased
>>> and accurate.}
>>> }
>>>
>>>
>>> hth,
>>>
>>> Kingsford Jones
>>>
>>>
>>>
>>>
>>> On Sun, Aug 30, 2009 at 5:53 AM, Highland Statistics
>>> Ltd.<highstat at highstat.com> wrote:
>>>       
>>>>> Alain Zuur's response to a recent posting raises an interesting
>>>>> question.
>>>>> To
>>>>> use a random effects model what number
>>>>>
>>>>> of groups is actually sufficient?
>>>>>
>>>>>
>>>>> I have heard talk of a minimum of 20 groups but have seen numerous
>>>>> examples
>>>>> in books and published papers with
>>>>>
>>>>> much less than this. Is there a definitive reference on this?
>>>>>
>>>>>
>>>>>           
>>>> Graham,
>>>>
>>>> Actually..it turned out that the data set for which the question was
>>>> asked,
>>>> had about 350 subjects I believe.
>>>>
>>>> But anyway....that is not your question. In general you see the magic "5"
>>>> in
>>>> some textbooks.....but for what it is worth...I recently had to program a
>>>> ZIP for 2-way nested data in RBugs..and in order to do this, I started
>>>> with
>>>> 1-way and 2-way GLMMs (just to build up the code). And to check whether
>>>> my
>>>> code was "correct", I compared the results with that of 3-4 R packages
>>>> (e.g.
>>>> glmmPQL, lmer, glmml).  The data set consisted of multiple observations
>>>> per
>>>> animal, for 5-30 animals per colony, and 9 colonies. I noticed that the
>>>> estimated values for the variance for the random intercept colony
>>>> differed a
>>>> lot between these packages. But all came with similar estimates for the
>>>> animal-within-colony random intercept.
>>>>
>>>> Not that it tells you that much (all packages giving the same result
>>>> doesn't
>>>> mean it is correct)....but it is a bit worrying. Perhaps a simulation
>>>> study
>>>> gives you a better answer. The data I use(d) are highly unbalanced..so
>>>> that
>>>> may have played a role as well.
>>>>
>>>> Alain
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Dr. Alain F. Zuur
>>>> First author of:
>>>>
>>>> 1. Analysing Ecological Data (2007).
>>>> Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
>>>> URL: www.springer.com/0-387-45967-7
>>>>
>>>>
>>>> 2. Mixed effects models and extensions in ecology with R. (2009).
>>>> Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
>>>> http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9
>>>>
>>>>
>>>> 3. A Beginner's Guide to R (2009).
>>>> Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
>>>> http://www.springer.com/statistics/computational/book/978-0-387-93836-3
>>>>
>>>>
>>>> Other books: http://www.highstat.com/books.htm
>>>>
>>>>
>>>> Statistical consultancy, courses, data analysis and software
>>>> Highland Statistics Ltd.
>>>> 6 Laverock road
>>>> UK - AB41 6FN Newburgh
>>>> Tel: 0044 1358 788177
>>>> Email: highstat at highstat.com
>>>> URL: www.highstat.com
>>>> URL: www.brodgar.com
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>>         
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>       
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>     
>
>   


-- 


Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com




More information about the R-sig-mixed-models mailing list