[R-sig-ME] Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)
Douglas Bates
bates at stat.wisc.edu
Thu Apr 10 00:28:33 CEST 2008
On 4/9/08, Douglas Bates <bates at stat.wisc.edu> wrote:
> On 4/9/08, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> > Hi Reinhold,
>
> > On Wed, Apr 09, 2008 at 05:45:54PM +0200, Reinhold Kliegl wrote:
> > > I think this is a reasonable summary.
>
> > > You were not clear on how you plan to use the conditional modes (i.e.,
> > > your point 1). Please keep in mind that conditional modes are not
> > > independent "observations" like a group mean or within-group effect or
> > > slope, simply because shrinkage correction uses all data. Also, for
> > > example, their correlations (i.e., between intercept and x for units
> > > of C) are typically not identical to the estimated model correlations
> > > displayed in the random-effects part (see also the Bates quote in my
> > > last comment).
>
> > > In analyses of reaction times (using subjects and items as crossed
> > > random factors; carried out with Mike Masson and Eike Richter, 2007),
> > > model-based estimates of correlations among random effects revealed
> > > "clearer" patterns than the correlations between means and effects
> > > computed for each subject (as they should, given that they were
> > > corrected for unreliability). Unlike for fixed-effects estimates,
> > > however, estimates of correlations among random effects were quite
> > > susceptible to violations of distributional assumptions for the
> > > residuals--up to a change in the sign of the correlation!
>
> > This is a very interesting observation, and one that I suspect should
> > not be buried in an email. Can you tell us more about it? In my
> > workshops, I spend a lot of time focusing on the use of diagnostics to
> > check distributional assumptions. It would be fabulous to be able to
> > identify a case study in which getting the distributional assumptions
> > was so clearly important.
>
> > More generally, I wonder if it might be worth collecting such a set of
> > case studies with clear and thorough analyses and wrapping them in a
> > document. It seems to me that it would answer the request made by
> > Iasonas Lamprianou recently.
>
> > I'd be happy to coordinate such an effort, so long as the
> > contributions were in LaTeX and Sweave. I know my students would
> > benefit from it :)
>
> > Is there any interest in such an idea, from potential conributors or
> > (equally importantly) potential users?
>
>
> I certainly would be delighted to have such a collection made
> available and would be happy to have it hosted on
> http://lme4.r-forge.r-project.org/ if that seemed suitable.
>
> I would also recommend some of the examples in chapter 7 of Haarald
(Sorry Harald - I got carried away doubling the a's in your name.)
> Baayen's new book "Analyzing Linguistic Data: A Practical Introduction
> to Statistics using R"
>
> # Paperback: 368 pages
> # Publisher: Cambridge University Press; 1 edition (March 17, 2008)
> # Language: English
> # ISBN-10: 0521709180
> # ISBN-13: 978-0521709187
>
>
>
> > > As far as
> > > the use of conditional modes is concerned, the absolute values of
> > > correlations between conditional modes were always larger than the
> > > corresponding model estimates.
> > > In simulations, the model estimates of correlations recovered the
> > > "true" variances and correlations, even after random deletion of 50%
> > > of the data, but the variance of the conditional modes always
> > > underestimated the true variance and the difference between model
> > > estimate and correlation based on conditional modes increased with the
> > > absolute magnitude of the correlation. In other words, conditional
> > > modes underestimated the variance and exaggerated covariances and
> > > correlations of random effects in these simulations. The shrinkage in
> > > variance reflects the contribution of the likelihood in the
> > > computation of the conditional modes. In summary, according to these
> > > simulations, the model estimates of correlations among random effects
> > > are fine; the computed correlations based on conditional modes may
> > > serve a useful heuristic function for further analyses but must be
> > > handled with care.
> > >
> > > Best
> > > Reinhold
> > >
> > > On Wed, Apr 9, 2008 at 11:21 AM, Nick Isaac <njbisaac at googlemail.com> wrote:
> > > > Dear all,
> > > >
> > > > Thanks for the comments and apologies for not providing more
> > > > information. I (mis)judged it would be better to discuss the issue
> > > > abstractly. There should be enough levels to estimate the variance of
> > > > C and at least one other random effect:
> > > >
> > > > Number of obs: 1242, groups: D, 269; C, 64; B, 8; A, 3
> > > >
> > > > My interpretation of comments by all three respondents is as follows:
> > > > 1) extracting the random effects/BLUPs/conditional modes is reasonable
> > > > in general
> > > > 2) a taxonomy might be considered fixed or random, depending on the
> > > > question and the number of units/levels
> > > > 3) In my case, it would be better to use the conditional modes for x|C
> > > > than to fit x*C as an interaction term.
> > > >
> > > > Best wishes, Nick
> > > >
> > > >
> > > >
> > > >
> > > > On 08/04/2008, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> > > > > On Tue, Apr 08, 2008 at 07:10:16PM +0200, Reinhold Kliegl wrote:
> > > > > > > My dataset has one continuous normally-distributed fixed effect and
> > > > > > > four random effects that are nested (in fact, it is a taxonomy). For
> > > > > > > simplicity, I've removed the variable names, so the dataset has the
> > > > > > > following structure:
> > > > > > >
> > > > > > > y ~ x | A/B/C/D
> > > > > > It would be good to know how many units/levels you have for each of
> > > > > > your four random effects. Those with fewer than, say, five, are good
> > > > > > candidates for being specified as fixed effects. Think how many
> > > > > > observations you need to get a stable estimate of a variance!
> > > > > >
> > > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + C + x:C) #error:
> > > > > > > Downdated X'X is not positive definite, 82
> > > > > > You cannot include C both as a random and a fixed effect
> > > > >
> > > > >
> > > > >
> > > > > I do not believe that this is generally true. See, for example,
> > > > >
> > > > > > require(lme4)
> > > > > > (fm1 <- lmer(Reaction ~ Days + Subject + (Days|Subject), sleepstudy))
> > > > >
> > > > > Therefore I am uncertain as to how you can draw this conclusion
> > > > > without more information about the design (which the poster really
> > > > > should have provided).
> > > > >
> > > > >
> > > > >
> > > > > > > lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + x:C) #gives sensible results
> > > > > > If this gives sensible results, I suspect you have very few levels of
> > > > > > C, say, 2 or 3?
> > > > > > In this case, definitely specify C and x and their interaction as
> > > > > > fixed effects, e.g.:
> > > > > > lmer( y ~ x*C + (1|A) + (1|B) + (1|D)
> > > > > >
> > > > > > The following may not apply to your case, but it might: Sometimes
> > > > > > people think that a nested/taxonomic design implies a random effect
> > > > > > structure (e.g., schools, classes, students). This is not true. If you
> > > > > > have only a few units for each factor, you are better off to specify
> > > > > > it as a fixed-effects rather than a random-effects taxonomy. (Of
> > > > > > course, you lose generalizability, but if you want this you should
> > > > > > make sure you have sample that provides a basis for it.)
> > > > >
> > > > >
> > > > > I can see the sense behind this position but sometimes a few units are
> > > > > all that is available, and including them in a model as fixed effects
> > > > > muddies the statistical waters, especially if they are the kinds of
> > > > > effects that a model user will be unlikely to naturally condition upon.
> > > > >
> > > > > I do agree that if there are problems with model fitting and/or
> > > > > interpretation when the design is rigorously followed, then a more
> > > > > flexible approach can and should be adopted, and appropriate
> > > > > allowances must be made.
> > > > >
> > > > >
> > > > > > The interpretation of conditional modes (formerly knowns as BLUPs,
> > > > > > that is "predictions") is a tricky business, especially with few
> > > > > > units per levels.
> > > > >
> > > > >
> > > > > Sorry, I think I've missed something. In what sense are the
> > > > > conditional modes formerly known as BLUPs?
> > > > >
> > > > > Andrew
> > > > >
> > > > >
> > > > > --
> > > > > Andrew Robinson
> > > > > Department of Mathematics and Statistics Tel: +61-3-8344-6410
> > > > > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
> > > > > http://www.ms.unimelb.edu.au/~andrewpr
> > > > > http://blogs.mbs.edu/fishing-in-the-bay/
> > > > >
> > > >
> >
> > --
> > Andrew Robinson
> > Department of Mathematics and Statistics Tel: +61-3-8344-6410
> > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
> > http://www.ms.unimelb.edu.au/~andrewpr
> > http://blogs.mbs.edu/fishing-in-the-bay/
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
More information about the R-sig-mixed-models
mailing list