[R-sig-ME] Distributional assumptions + case studies (was: Random or Fixed effects appropriate?)

Thu Apr 10 00:06:10 CEST 2008

Hi Reinhold,

On Wed, Apr 09, 2008 at 05:45:54PM +0200, Reinhold Kliegl wrote:
> I think this is a reasonable summary.
> 
> You were not clear on how you plan to use the conditional modes (i.e.,
> your point 1).  Please keep in mind that conditional modes are not
> independent "observations" like a group mean or within-group effect or
> slope, simply because shrinkage correction uses all data. Also, for
> example, their correlations (i.e., between intercept and x for units
> of C) are typically not identical to the estimated model correlations
> displayed in the random-effects part (see also the Bates quote in my
> last comment).
> 
> In analyses of reaction times (using subjects and items as crossed
> random factors; carried out with Mike Masson and Eike Richter, 2007),
> model-based estimates of correlations among random effects revealed
> "clearer" patterns than the correlations between means and effects
> computed for each subject (as they should, given that they were
> corrected for unreliability). Unlike for fixed-effects estimates,
> however, estimates of correlations among random effects were quite
> susceptible to violations of distributional assumptions for the
> residuals--up to a change in the sign of the correlation! 

This is a very interesting observation, and one that I suspect should
not be buried in an email.  Can you tell us more about it?  In my
workshops, I spend a lot of time focusing on the use of diagnostics to
check distributional assumptions.  It would be fabulous to be able to
identify a case study in which getting the distributional assumptions
was so clearly important.

More generally, I wonder if it might be worth collecting such a set of
case studies with clear and thorough analyses and wrapping them in a
document.  It seems to me that it would answer the request made by
Iasonas Lamprianou recently.

I'd be happy to coordinate such an effort, so long as the
contributions were in LaTeX and Sweave.  I know my students would
benefit from it :)

Is there any interest in such an idea, from potential conributors or
(equally importantly) potential users?  

Cheers

Andrew

> As far as
> the use of conditional modes is concerned, the absolute values of
> correlations between conditional modes were always larger than the
> corresponding model estimates.
>      In simulations, the model estimates of correlations recovered the
> "true" variances and correlations, even after random deletion of 50%
> of the data, but the variance of the conditional modes always
> underestimated the true variance and the difference between model
> estimate and correlation based on conditional modes increased with the
> absolute magnitude of the correlation. In other words, conditional
> modes underestimated the variance and exaggerated covariances and
> correlations of random effects in these simulations. The shrinkage in
> variance reflects the contribution of the likelihood in the
> computation of the conditional modes.  In summary, according to these
> simulations, the model estimates of correlations among random effects
> are fine; the computed correlations based on conditional modes may
> serve a useful heuristic function for further analyses but must be
> handled with care.
> 
> Best
> Reinhold
> 
> On Wed, Apr 9, 2008 at 11:21 AM, Nick Isaac <njbisaac at googlemail.com> wrote:
> > Dear all,
> >
> >  Thanks for the comments and apologies for not providing more
> >  information. I (mis)judged it would be better to discuss the issue
> >  abstractly. There should be enough levels to estimate the variance of
> >  C and at least one other random effect:
> >
> >  Number of obs: 1242, groups: D, 269; C, 64; B, 8; A, 3
> >
> >  My interpretation of comments by all three respondents is as follows:
> >  1) extracting the random effects/BLUPs/conditional modes is reasonable
> >  in general
> >  2) a taxonomy might be considered fixed or random, depending on the
> >  question and the number of units/levels
> >  3) In my case, it would be better to use the conditional modes for x|C
> >  than to fit x*C as an interaction term.
> >
> >  Best wishes, Nick
> >
> >
> >
> >
> >  On 08/04/2008, Andrew Robinson <A.Robinson at ms.unimelb.edu.au> wrote:
> >  > On Tue, Apr 08, 2008 at 07:10:16PM +0200, Reinhold Kliegl wrote:
> >  >  > >  My dataset has one continuous normally-distributed fixed effect and
> >  >  > >  four random effects that are nested (in fact, it is a taxonomy). For
> >  >  > >  simplicity, I've removed the variable names, so the dataset has the
> >  >  > >  following structure:
> >  >  > >
> >  >  > >  y ~ x | A/B/C/D
> >  >  > It would be good to know how many units/levels you have for each of
> >  >  > your four random effects. Those with fewer than, say, five, are good
> >  >  > candidates for being specified as fixed effects. Think how many
> >  >  > observations you need to get a stable estimate of a variance!
> >  >  >
> >  >  > >  lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + C + x:C) #error:
> >  >  > >  Downdated X'X is not positive definite, 82
> >  >  > You cannot include C both as a random and a fixed effect
> >  >
> >  >
> >  >
> >  > I do not believe that this is generally true.  See, for example,
> >  >
> >  >  > require(lme4)
> >  >  > (fm1 <- lmer(Reaction ~ Days + Subject + (Days|Subject),  sleepstudy))
> >  >
> >  >  Therefore I am uncertain as to how you can draw this conclusion
> >  >  without more information about the design (which the poster really
> >  >  should have provided).
> >  >
> >  >
> >  >
> >  >  > >  lmer( y ~ x + (1|A) + (1|B) + (1|C) + (1|D) + x:C) #gives sensible results
> >  >  > If this gives sensible results, I suspect you have very few levels of
> >  >  > C, say, 2 or 3?
> >  >  > In this case, definitely specify C and x and their interaction as
> >  >  > fixed effects, e.g.:
> >  >  > lmer( y ~ x*C + (1|A) + (1|B)  + (1|D)
> >  >  >
> >  >  > The following may not apply to your case, but it might: Sometimes
> >  >  > people think that a nested/taxonomic design implies a random effect
> >  >  > structure (e.g., schools, classes, students). This is not true. If you
> >  >  > have only a few units for each factor, you are better off to specify
> >  >  > it as a fixed-effects rather than a random-effects taxonomy. (Of
> >  >  > course, you lose generalizability, but if you want this you should
> >  >  > make sure you have sample that provides a basis for it.)
> >  >
> >  >
> >  > I can see the sense behind this position but sometimes a few units are
> >  >  all that is available, and including them in a model as fixed effects
> >  >  muddies the statistical waters, especially if they are the kinds of
> >  >  effects that a model user will be unlikely to naturally condition upon.
> >  >
> >  >  I do agree that if there are problems with model fitting and/or
> >  >  interpretation when the design is rigorously followed, then a more
> >  >  flexible approach can and should be adopted, and appropriate
> >  >  allowances must be made.
> >  >
> >  >
> >  >  > The interpretation of conditional modes (formerly knowns as BLUPs,
> >  >  > that is "predictions") is a tricky business, especially with few
> >  >  > units per levels.
> >  >
> >  >
> >  > Sorry, I think I've missed something.  In what sense are the
> >  >  conditional modes formerly known as BLUPs?
> >  >
> >  >  Andrew
> >  >
> >  >
> >  >  --
> >  >  Andrew Robinson
> >  >  Department of Mathematics and Statistics            Tel: +61-3-8344-6410
> >  >  University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
> >  >  http://www.ms.unimelb.edu.au/~andrewpr
> >  >  http://blogs.mbs.edu/fishing-in-the-bay/
> >  >
> >

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/