[R-sig-ME] Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)
Douglas Bates
bates at stat.wisc.edu
Thu Apr 10 00:27:18 CEST 2008
On 4/9/08, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> Hi Reinhold,
> On Wed, Apr 09, 2008 at 05:45:54PM +0200, Reinhold Kliegl wrote:
> > I think this is a reasonable summary.
> > You were not clear on how you plan to use the conditional modes (i.e.,
> > your point 1). Please keep in mind that conditional modes are not
> > independent "observations" like a group mean or within-group effect or
> > slope, simply because shrinkage correction uses all data. Also, for
> > example, their correlations (i.e., between intercept and x for units
> > of C) are typically not identical to the estimated model correlations
> > displayed in the random-effects part (see also the Bates quote in my
> > last comment).
> > In analyses of reaction times (using subjects and items as crossed
> > random factors; carried out with Mike Masson and Eike Richter, 2007),
> > model-based estimates of correlations among random effects revealed
> > "clearer" patterns than the correlations between means and effects
> > computed for each subject (as they should, given that they were
> > corrected for unreliability). Unlike for fixed-effects estimates,
> > however, estimates of correlations among random effects were quite
> > susceptible to violations of distributional assumptions for the
> > residuals--up to a change in the sign of the correlation!
> This is a very interesting observation, and one that I suspect should
> not be buried in an email. Can you tell us more about it? In my
> workshops, I spend a lot of time focusing on the use of diagnostics to
> check distributional assumptions. It would be fabulous to be able to
> identify a case study in which getting the distributional assumptions
> was so clearly important.
> More generally, I wonder if it might be worth collecting such a set of
> case studies with clear and thorough analyses and wrapping them in a
> document. It seems to me that it would answer the request made by
> Iasonas Lamprianou recently.
> I'd be happy to coordinate such an effort, so long as the
> contributions were in LaTeX and Sweave. I know my students would
> benefit from it :)
> Is there any interest in such an idea, from potential conributors or
> (equally importantly) potential users?
I certainly would be delighted to have such a collection made
available and would be happy to have it hosted on
http://lme4.r-forge.r-project.org/ if that seemed suitable.
I would also recommend some of the examples in chapter 7 of Haarald
Baayen's new book "Analyzing Linguistic Data: A Practical Introduction
to Statistics using R"
# Paperback: 368 pages
# Publisher: Cambridge University Press; 1 edition (March 17, 2008)
# Language: English
# ISBN-10: 0521709180
# ISBN-13: 978-0521709187
> > As far as
> > the use of conditional modes is concerned, the absolute values of
> > correlations between conditional modes were always larger than the
> > corresponding model estimates.
> > In simulations, the model estimates of correlations recovered the
> > "true" variances and correlations, even after random deletion of 50%
> > of the data, but the variance of the conditional modes always
> > underestimated the true variance and the difference between model
> > estimate and correlation based on conditional modes increased with the
> > absolute magnitude of the correlation. In other words, conditional
> > modes underestimated the variance and exaggerated covariances and
> > correlations of random effects in these simulations. The shrinkage in
> > variance reflects the contribution of the likelihood in the
> > computation of the conditional modes. In summary, according to these
> > simulations, the model estimates of correlations among random effects
> > are fine; the computed correlations based on conditional modes may
> > serve a useful heuristic function for further analyses but must be
> > handled with care.
> >
> > Best
> > Reinhold
> >
> > On Wed, Apr 9, 2008 at 11:21 AM, Nick Isaac <njbisaac at googlemail.com> wrote:
> > > Dear all,
> > >
> > > Thanks for the comments and apologies for not providing more
> > > information. I (mis)judged it would be better to discuss the issue
> > > abstractly. There should be enough levels to estimate the variance of
> > > C and at least one other random effect:
> > >
> > > Number of obs: 1242, groups: D, 269; C, 64; B, 8; A, 3
> > >
> > > My interpretation of comments by all three respondents is as follows:
> > > 1) extracting the random effects/BLUPs/conditional modes is reasonable
> > > in general
> > > 2) a taxonomy might be considered fixed or random, depending on the
> > > question and the number of units/levels
> > > 3) In my case, it would be better to use the conditional modes for x|C
> > > than to fit x*C as an interaction term.
> > >
> > > Best wishes, Nick
> > >
> > >
> > >
> > >
> > > On 08/04/2008, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> > > > On Tue, Apr 08, 2008 at 07:10:16PM +0200, Reinhold Kliegl wrote:
> > > > > > My dataset has one continuous normally-distributed fixed effect and
> > > > > > four random effects that are nested (in fact, it is a taxonomy). For
> > > > > > simplicity, I've removed the variable names, so the dataset has the
> > > > > > following structure:
> > > > > >
> > > > > > y ~ x | A/B/C/D
> > > > > It would be good to know how many units/levels you have for each of
> > > > > your four random effects. Those with fewer than, say, five, are good
> > > > > candidates for being specified as fixed effects. Think how many
> > > > > observations you need to get a stable estimate of a variance!
> > > > >
> > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + C + x:C) #error:
> > > > > > Downdated X'X is not positive definite, 82
> > > > > You cannot include C both as a random and a fixed effect
> > > >
> > > >
> > > >
> > > > I do not believe that this is generally true. See, for example,
> > > >
> > > > > require(lme4)
> > > > > (fm1 <- lmer(Reaction ~ Days + Subject + (Days|Subject), sleepstudy))
> > > >
> > > > Therefore I am uncertain as to how you can draw this conclusion
> > > > without more information about the design (which the poster really
> > > > should have provided).
> > > >
> > > >
> > > >
> > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + x:C) #gives sensible results
> > > > > If this gives sensible results, I suspect you have very few levels of
> > > > > C, say, 2 or 3?
> > > > > In this case, definitely specify C and x and their interaction as
> > > > > fixed effects, e.g.:
> > > > > lmer( y ~ x*C + (1|A) + (1|B) + (1|D)
> > > > >
> > > > > The following may not apply to your case, but it might: Sometimes
> > > > > people think that a nested/taxonomic design implies a random effect
> > > > > structure (e.g., schools, classes, students). This is not true. If you
> > > > > have only a few units for each factor, you are better off to specify
> > > > > it as a fixed-effects rather than a random-effects taxonomy. (Of
> > > > > course, you lose generalizability, but if you want this you should
> > > > > make sure you have sample that provides a basis for it.)
> > > >
> > > >
> > > > I can see the sense behind this position but sometimes a few units are
> > > > all that is available, and including them in a model as fixed effects
> > > > muddies the statistical waters, especially if they are the kinds of
> > > > effects that a model user will be unlikely to naturally condition upon.
> > > >
> > > > I do agree that if there are problems with model fitting and/or
> > > > interpretation when the design is rigorously followed, then a more
> > > > flexible approach can and should be adopted, and appropriate
> > > > allowances must be made.
> > > >
> > > >
> > > > > The interpretation of conditional modes (formerly knowns as BLUPs,
> > > > > that is "predictions") is a tricky business, especially with few
> > > > > units per levels.
> > > >
> > > >
> > > > Sorry, I think I've missed something. In what sense are the
> > > > conditional modes formerly known as BLUPs?
> > > >
> > > > Andrew
> > > >
> > > >
> > > > --
> > > > Andrew Robinson
> > > > Department of Mathematics and Statistics Tel: +61-3-8344-6410
> > > > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
> > > > http://www.ms.unimelb.edu.au/~andrewpr
> > > > http://blogs.mbs.edu/fishing-in-the-bay/
> > > >
> > >
>
> --
> Andrew Robinson
> Department of Mathematics and Statistics Tel: +61-3-8344-6410
> University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
> http://www.ms.unimelb.edu.au/~andrewpr
> http://blogs.mbs.edu/fishing-in-the-bay/
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list