[R-sig-ME] DF in lme
Ben Bolker
bbolker at gmail.com
Wed Mar 16 16:08:29 CET 2011
On 11-03-16 03:52 AM, Ben Ward wrote:
> Hi, I'm using lme and lmer in my dissertation and it's the first time
> I've used these methods. Taking into account replies from my previous
> query I decided to go through with a model simplification, and then try
> to validate the models in various ways and come up with the best one to
> include in my work, be it a linear mixed effects model or general linear
> effects model, with log() data or not etc - interestingly it does not
> seems like doing transofrmations and such makes much difference so far,
> looking at changes in diagnostic plots and AIC.
Be careful about comparing fits of transformed and non-transformed
data via AIC/log-likelihood: e.g. see
<http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm>.
(This does *not* refer to the link function, e.g. the log link of the
Poisson, but to the case where you transform your data prior to analysis.)
> Anywho, I simplified to the model using lme (I've pasted it at the
> bottom). And looking at the anova output the numDF looks right. However
> I'm concerned about the 342 df in the denDF in anova() and in the
> summary() output, as it seems to high to me, because at the observation
> level is too high and pseudoreplicated; 4 readings per disk, 3 disks,
> per plate, 3 plates per lineage, 5 lineages per group, 2 groups so:
> 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish
> level it's 3*5*2=30 degrees of freedom for error. And either dish or
> disk (arguments for both) is the level at which one independant point of
> datum is obtained, most probably Dish. So I'm wondering if either I'de
> done something wrong, or I'm not understanding how df are presented and
> used in mixed models. It's not really explained in my texts, and my
> lecturer told me I'm working at the edge of his personal/professional
> experience.
At what level are Group and Lineage replicated in the model? Do you
have different Groups or Lineages represented on the same disk, dish, or
plate? If you do have multiple Groups and Lineages present at the
lowest level of replication, then you have a randomized block design and
the degrees of freedom may be higher than you think. If you really want
denominator degrees of freedom and you want them correct, consult an
experimental design book and figure out what they should be in the
classical framework ...
> I've used lmer and the function in languageR to extract p-values without
> it even mentioning df. Now if the lmer method with pvals.fnc() makes it
> so as I don't have to worry about these df then in a way it makes my
> issue a bit redundant. But it is playing on my mind a bit so felt I
> should ask.
>
> My second question is about when I do the equivalent model using lmer:
> "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which
> I'm sure does the same because all my plots of residuals against fitted
> and such are the same, if I define it with the poisson family, which
> uses log, then I get a much lower AIC of about 45, compared to over 1000
> without family defined, which I think defaults to gaussian/normal.
I don't think you should try to pick the family on the basis of AIC --
you should pick it on the basis of the qualitative nature of the data.
If you have count data, you should probably use Poisson (but you may
want to add an observation-level random effect to allow for
overdispersion.) If your response variable is Diameter, it is **not** a
count variable, and you shouldn't use Poisson -- you should use an
appropriately transformed response variable.
And
> my diagnostic plots still give me all the same patters, but just looking
> a bit different because of the family distribution specified. I then did
> a model logging the response variable by using log(Diameter), again, I
> get the same diagnostic plot patterns, but on a different scale, and I
> get an AIC of - 795.6. Now normally I'd go for the model with the lowest
> AIC, however, I've never observed this beahviour before, and can't help
> but think thhat the shift from a posotive 1000+ AIC to a negative one is
> due to the fact the data has been logged, rather than that the model
> fitted to log data in this way is genuinley better.
>
> Finally, I saw in a text, an example of using lmer but "Recoding Factor
> Levels" like:
> lineage<-Group:Lineage
> dish<-Group:Lineage:Dish
> disk<-Group:Lineage:Dish:Disk
> model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk)
>
> However I don't see why this should need to be done, considering, the
> study was hieracheal, just like all other examples in that chapter, and
> it does not give a reason why, but says it does the same job as a nested
> anova, which I though mixed models did anyway.
(1|lineage)+(1|dish)+(1|disk)
is the same as
(1|Lineage/Dish/Disk)
(1|Dish) + (1|Disk) is **not** the same as (1|Dish/Disk), if Disk is
not labeled uniquely (i.e. if Dishes are A, B, C, .. and Disks are 1, 2,
3, ... then you need Dish/Disk. If you have labeled Disks A1, A2, ...
B1, B2, ... then the specifications are equivalent.
For a linear mixed model (i.e. not Poisson counts) you should be able
to run the same model in lmer and lme and get extremely similar results.
>
> Hopefully sombody can shed light on my concerns. In terms of my work and
> university, I could include what I've done here and be as transparrant
> as possible and discuss these issues, because log() of the data or
> defining a distribution in the model is leading to the same plots and
> conclusions. But I'd like to make sure I come to term with what's
> actually happening here.
>
> A million thanks,
> Ben W.
>
>
> lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset,
> method="REML")
>
>>anova(lme14):
>
> numDF denDF F-value p-value
> (Intercept) 1 342 16538.253 <.0001
> Group 1 342 260.793 <.0001
> Lineage 4 342 8.226 <.0001
> Group:Lineage 4 342 9.473 <.0001
>
>> summary(lme14)
> Linear mixed-effects model fit by REML
> Data: Dataset
> AIC BIC logLik
> 1148.317 1198.470 -561.1587
>
> Random effects:
> Formula: ~1 | Dish
> (Intercept)
> StdDev: 0.1887527
>
> Formula: ~1 | Disk %in% Dish
> (Intercept) Residual
> StdDev: 6.303059e-05 1.137701
>
> Fixed effects: Diameter ~ Group * Lineage
> Value Std.Error DF t-value
> p-value
> (Intercept) 15.049722 0.2187016 342 68.81396
> 0.0000
> Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
> 0.0003
> Lineage[T.First] -0.116389 0.2681586 342 -0.43403
> 0.6645
> Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
> 0.8872
> Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
> 0.5085
> Lineage[T.Third] 0.221111 0.2681586 342 0.82455
> 0.4102
> Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
> 0.0000
> Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
> 0.0122
> Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
> 0.0296
> Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
> 0.0579
> Correlation:
> (Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
> Group[T.NEDettol] -0.613
> Lineage[T.First] -0.613 0.500
> Lineage[T.Fourth] -0.613 0.500 0.500
> Lineage[T.Second] -0.613 0.500 0.500 0.500
> Lineage[T.Third] -0.613 0.500 0.500 0.500
> Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
> Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
> Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
> Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
> L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
> Group[T.NEDettol]
> Lineage[T.First]
> Lineage[T.Fourth]
> Lineage[T.Second]
> Lineage[T.Third] 0.500
> Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
> Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
> Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
> Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
> Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
> Group[T.NEDettol]
> Lineage[T.First]
> Lineage[T.Fourth]
> Lineage[T.Second]
> Lineage[T.Third]
> Group[T.NEDettol]:Lineage[T.First]
> Group[T.NEDettol]:Lineage[T.Fourth]
> Group[T.NEDettol]:Lineage[T.Second] 0.500
> Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
>
> Standardized Within-Group Residuals:
> Min Q1 Med Q3 Max
> -2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
>
> Number of Observations: 360
> Number of Groups:
> Dish Disk %in% Dish
> 3 9
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list