[R-sig-ME] CHOLMOD error in lmer-specifying nested random effects

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Tue Aug 11 00:35:51 CEST 2009


Le lundi 10 août 2009 à 10:28 -0400, Eli Swanson a écrit :
> Hi Emmanuel,
> 
>     Ok, point very well taken, all my values are integers, but you are
> correct that they were discretized by the measurement process.
> 
>     In that case, is there any input on what the current state of
> negative binomial distributions is?  My understanding is that one can
> use lmer as long as you specify a theta value, or use glmmADMB.  Are
> these approaches still valid?  The conversations I found on the boards
> were a couple years old.  Do you think I would I be better off using
> one of the transformations you suggest?

My objection to Poisson also applies to negative binomial (a discrete
distribution, last time I looked).

As to the transformation to use, I rummaged your previous posts and came
with  possible alternative :

The difficulty is that your observation is indeed a proportion, but not,
strictly speaking, a counting process. In your original posting, you
stated : "The response variable is the number of minutes a cub was
observed nursing (Nurse.min).", which I'm tempted to rewrite as "The
response variable is the *duration* a cub was observed nursing
(Nurse.t)." This duration is bound *twice* : on the left by 0, on the
right by the *total duration* of your observation (say, Obs.t).

I *think* that your variable of interest might be better expressed as
Nurse.frac=Nurse.t/Obs.t, which is bound by 0 nd 1, like a proportion.
Here, the sins of my past return to haunt me and sorely tempt me to
treat it a bit like a logistic regression problem :

What happens when you study log(Nurse.frac/(1-Nurse.frac)) in a *linear*
(mixed) model ?

Note, however, you *cannot* treat it as a "real" logistic regression
problem : the variance of this fraction is *not* a simple function of
Nurse.frac and Obs.t (your notional "sample size") ; here you have *one*
observation per unit, not Obs.mins. (To "see" it better, think of the
different results you would get when expressing your times in minutes
and in seconds...). Whereas in Poisson or logistic regression you have
to asses only one parameter (\pi or \lambda), here, you have two : \mu
and \sigma, the later being a "nuisance" parameter (which can indeed be
real nuisance : look out for heteroscedasticity).

Therefore, use lmer(..., family=gaussian) (or maybe lme(), which would
allow you to model heteroscedasticity), rather than lmer(...,
family=binomial).

Hope this helps,

						Emmanuel Charpentier

> Thanks!
> Eli
> 
> 
> 
> 
> -- 
> Eli Swanson
> Department of Zoology
> Ecology, Evolutionary Biology, and Behavior Program
> Michigan State University
> 
> 
> On Mon, Aug 10, 2009 at 3:51 AM, Emmanuel
> Charpentier<charpent at bacbuc.dyndns.org> wrote:
> >
> >
> > Dear Luca, dear list,
> >
> > Sorry for buttin' in after the fact, but I wonder why you want to use a
> > (quasi-)Poisson model ? Unless I misunderstand your problem, your
> > dependent variable is, in essence, continuous, even if discretized by
> > the measurement process : when you record, for example, 5 minutes, that
> > means that the time (in mins) is some *real* value t in [4.5 5.5). Using
> > a model based on a discrete distribution does not make sense to me ;
> > ditto for "overdispersion".
> >
> > Of course, your data may have a non-normal residuals' distribution
> > and/or heteroscedasticity. That means that you may have to use a
> > variable transformation such as log, or something more sophisticated
> > such as Box & Cox or logtrans transformation : see V&R4, for example.
> > MASS has a couple of functions for determination of the "optimal" Box &
> > Cox or logtrans parameter of the dependent variable transformation in
> > fixed effect models ; ISTR that the car package has something more
> > sophisticate, proposing transformation of the dependent and independent
> > variables ; ISTR also that there exist a couple of packages on CRAN
> > dedicated to this class of problems.
> >
> > However, as far as I know, all these functions are written for
> > fixed-effects models. You may have to adapt them to random effects
> > models and/or find the optimal transformations on analogous fixed
> > effects models. You may also have to settle for a "reasonable"
> > transformation on general principles and check a posteriori that your
> > residuals are not (wildly)non-normal or heteroscedastic (ditto for the
> > random effects) ... and remember that the linear model is somewhat
> > robust to (small) deviations from these preconditions.
> >
> > Last resort : bootstrap and permutation tests, but this is
> > non-trivial... Maybe a look at the "coin" package may be useful.
> >
> > Hope this helps (even a bit late),
> >
> >                                        Emmanuel Charpentier
> >
> > Le vendredi 07 août 2009 à 15:44 -0400, Eli Swanson a écrit :
> >> Hi,
> >> Thanks Luca, number 1 certainly makes sense to me- i only had sex
> >> nested within cub like that as a random effect due to something i'd
> >> read in a previous post.
> >>
> >> Your answer number 2 was really helpful, I did not know that.  I do in
> >> fact already have a unique ID for both each individual mom and each
> >> individual cub.  Recoding this did not instantly solve my problem, but
> >> mean centering and standardizing instantly fixed it.  My new model:
> >> ger<-lmer(Nurse.min~Mom.mins+Mom.rank+Sex+(1|Cub)+(1|Mom),data=data,family=quasipoisson)
> >>
> >> Is mean centering and standardizing totally valid in this case?  I
> >> dont know why it wouldnt be, it just seems, i dont know, a very simple
> >> fix for an ongoing problem ive been having.
> >>
> >> I have tried to read all the old posts about overdispersion, and in
> >> fact have calculated other values that have been suggested, as Doug
> >> said that sigma probably wasn't valid, ie.
> >> ger at deviance["pwrss"]/ger at dims["n"]
> >>    pwrss
> >> 6.933783
> >>
> >> But, I can't find anywhere what the meaning is of deviance/n .  Is it
> >> an estimate of the scale parameter?  In that case my data is clearly
> >> overdispersed, but I don't have a meaningful scale on which to think
> >> about this that I know is accurate.  I may have missed some of the
> >> info however, so I'll go back and read through the old posts again.
> >>
> >> Thanks again!
> >> Eli
> >>
> >> On Fri, Aug 7, 2009 at 2:31 PM, Luca Borger<lborger at uoguelph.ca> wrote:
> >> > Hello,
> >> >
> >> > a quick reply to get this started (you may get more expert feedback by other members of the list).
> >> >
> >> > 1. I'd expect sex to have an effect on nursing time (longer time for males?). Sex is definitively a fixed effect (see also the mailing list archives).
> >> >
> >> > 2. Specify a unique ID code for each cub and for each mother, then lmer will work out the nesting without the need to specify it explicitly (see posts by Doug Bates on this issue).
> >> >
> >> >
> >> > Thus I'd try something like:
> >> >
> >> >
> >> > ger<-lmer(Nurse.min~Mom.min+Mom.rank+Sex+(1|Cub)+(1|Mom),data=data,family=poisson,control=list(msVerbose=T))
> >> >
> >> >
> >> > If you don't have multiple cubs for each mother it might be that a model with only cubID might be more appropriate (but am guessing here). If the warning persists try also centrering/standardizing Mom.min.
> >> >
> >> > Re the overdispersion issue have a look at the mailing list archive (e.g. postings by Ben Bolker).
> >> >
> >> >
> >> > HTH
> >> >
> >> >
> >> >
> >> > Cheers,
> >> >
> >> >
> >> > Luca
> >> >
> >> >
> >> > ----- Messaggio originale -----
> >> > Da: "Eli Swanson" <eliswanson at gmail.com>
> >> > A: r-sig-mixed-models at r-project.org
> >> > Inviato: Venerdì, 7 agosto 2009 18:17:06 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
> >> > Oggetto: [R-sig-ME] CHOLMOD error in lmer-specifying nested random effects
> >> >
> >> > Hi all,
> >> >
> >> >
> >> >
> >> > I'm working on an analyzing some data right now that's causing me some
> >> > difficulties.  I have 2 questions, and would really appreciate any
> >> > advice that people can offer.  I don't post data because there are
> >> > over 2000 cases.  sessionInfo() is posted at the end of my post.
> >> >
> >> >
> >> >
> >> > The data was collected during focal animal surveys (FAS).  The
> >> > response variable is the number of minutes a cub was observed nursing
> >> > (Nurse.min).  The fixed predictors are 1) the number of minutes the
> >> > mother was present during the FAS (Mom.min), 2) the mother's social
> >> > rank (Mom.rank), and 3)sex of cub(Sex).  The random effects are 1)
> >> > Identity of mom, and 2) identity of cub, as there are a large number
> >> > of observations for every mother, and most of the cubs.  Cub is nested
> >> > within mom, and when I include sex in the model, my understanding is
> >> > that I have to nest it within cub.
> >> >
> >> > Question the first:
> >> > I try to create the following model, leaving out sex for the moment,
> >> > because my understanding is that with only two cases, sex complicates
> >> > matters:
> >> >
> >> > ger<-lmer(Nurse.min~Mom.min+Mom.rank+(1|Mom:Cub)+(1|Mom),data=data,family=poisson,control=list(msVerbose=T))
> >> >
> >> > Is this model correctly specified?  I receive a CHOLMOD warning, as follows:
> >> >
> >> >   11918.870: 0.289063 0.172748 -0.0419576 0.0575048 0.00979307
> >> > CHOLMOD warning: 7u‡e_
> >> > Error in mer_finalize(ans) :
> >> >  Cholmod error `not positive definite' at
> >> > file:../Cholesky/t_cholmod_rowfac.c, line 432
> >> >
> >> > When I try to include Sex as a fixed effect, I'm even less sure I'm
> >> > specifying the model correctly but my model looks like this:
> >> >
> >> > ger<-lmer(Nurse.min~Mom.min+Mom.rank+Sex+(1|Mom:Cub:Sex)+(1|Mom),data=data,family=poisson,control=list(msVerbose=T))
> >> >
> >> > And my output:
> >> >  0:     11926.160: 0.289063 0.172748 0.172336 0.0572882 0.00311534
> >> > -0.244655 -0.683584
> >> > CHOLMOD warning: 7u‡e_
> >> > Error in mer_finalize(ans) :
> >> >  Cholmod error `not positive definite' at
> >> > file:../Cholesky/t_cholmod_rowfac.c, line 432
> >> >
> >> > The model works if I specify it as follows, but I can't get it to work
> >> > with sex, and I think I'm specifying the random effects incorrectly
> >> > here:
> >> >
> >> > ger<-lmer(Nurse.min~Mom.min+Mom.rank+(1|Mom:Cub),data=data,family=poisson)
> >> >
> >> > summary(ger)
> >> > Generalized linear mixed model fit by the Laplace approximation
> >> > Formula: Nurse.min ~ Mom.min + Mom.rank + (1 | Mom:Cub)
> >> >   Data: data
> >> >   AIC   BIC logLik deviance
> >> >  11381 11404  -5687    11373
> >> > Random effects:
> >> >  Groups  Name        Variance Std.Dev.
> >> >  Mom:Cub (Intercept) 2.1313   1.4599
> >> > Number of obs: 2234, groups: Mom:Cub, 70
> >> >
> >> > Fixed effects:
> >> >              Estimate Std. Error z value Pr(>|z|)
> >> > (Intercept) -1.4669098  0.1962970   -7.47 7.84e-14 ***
> >> > Mom.min      0.0688668  0.0007541   91.32  < 2e-16 ***
> >> > Mom.rank     0.0676121  0.0069510    9.73  < 2e-16 ***
> >> > ---
> >> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >> >
> >> > Correlation of Fixed Effects:
> >> >         (Intr) Mom.mn
> >> > Mom.min  -0.134
> >> > Mom.rank -0.332  0.115
> >> >
> >> >
> >> >
> >> > So, am I specifying these models correctly?  What could be the
> >> > problem?  I only have 1 observation for some of the cubs, and I
> >> > thought this might be the problem, but when I remove these cubs, the
> >> > error remains.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >  I'm not sure if it makes a difference, but I do have slightly
> >> > overdispersed data (its 0-inflated i think, but not hugely so), hence
> >> > my second question:
> >> >
> >> > When I use the quasipoisson family to estimate the model like so
> >> > (specifying the model the only way i could get it to work, even though
> >> > this may be incorrect)
> >> > :
> >> >  ger<-lmer(Nurse.min~Mom.min+Mom.rank+(1|Mom:Cub),data=data,family=quasipoisson,control=list(msVerbose=T))
> >> >
> >> > Then, by my understanding, estimate the degree of overdispersion:
> >> >
> >> > lme4:::sigma(ger)
> >> >  sigmaML
> >> > 1.233926
> >> >
> >> > How badly overdispersed is this?  I tried using MCMCglmm with a
> >> > "zipoisson", but its giving me strange results (in fact, estimates for
> >> > the fixed effects that appear to have an opposite sign of those I find
> >> > with lmer).  As I'm not an expert in Bayesian methods, I assume I'm
> >> > specifying something wrong there and would prefer to stick with lmer
> >> > for now if possible.  Can I just continue to use quasipoisson?  I know
> >> > that the standard errors are inflated, but if im not worried about
> >> > that does this mean that parameters are correctly estimated?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > sessionInfo()
> >> > R version 2.9.0 (2009-04-17)
> >> > i386-pc-mingw32
> >> >
> >> > locale:
> >> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> >> > States.1252;LC_MONETARY=English_United
> >> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >> >
> >> > attached base packages:
> >> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >> >
> >> > other attached packages:
> >> > [1] lme4_0.999375-28   Matrix_0.999375-24 lattice_0.17-22
> >> >
> >> > loaded via a namespace (and not attached):
> >> > [1] grid_2.9.0
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > i would appreciate any help that people can offer,
> >> > Thank you very much,
> >> > Eli
> >> >
> >> >
> >> >
> >> > --
> >> > Eli Swanson
> >> > Department of Zoology
> >> > Ecology, Evolutionary Biology, and Behavior Program
> >> > Michigan State University
> >> >
> >> > _______________________________________________
> >> > R-sig-mixed-models at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >> >
> >>
> >>
> >>
> >> --
> >> Eli Swanson
> >> Department of Zoology
> >> Ecology, Evolutionary Biology, and Behavior Program
> >> Michigan State University
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> >
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list