[R-sig-ME] package nlme

Tue Apr 12 20:40:27 CEST 2011

On 04/07/2011 06:33 AM, Anna Renwick wrote:
> Dear All
> 
> I am currently analysing ecological data looking at the effect of
> agricultural intensity on a range of biodiversity measures. I have data from
> 26 sites grouped into 9 clusters. I am therefore running a linear mixed
> model using the nlme package. I am aware that there is issues in mixed
> models of determing the number of degrees of freedom and hence the p values
> but as these are reported by nlme I have a query I was hoping I might be
> able to resolve.
> 
> The model I have run is below:
> 
> beerichmix1<-lme(MeanBeeSppRich~cultivate, data=bees, random=~1|cluster,
>  method="ML")
> 
> Linear mixed-effects model fit by maximum likelihood
> Data: bees
>       AIC      BIC    logLik
>  165.0555 170.0879 -78.52774
> 
> Random effects:
> Formula: ~1 | cluster
>        (Intercept) Residual
> StdDev:    1.899494 4.636802
> 
> Fixed effects: MeanBeeSppRich ~ cultivate
>               Value Std.Error DF   t-value p-value
> (Intercept) 38.19658  4.267833 16  8.949878  0.0000
> cultivate   -0.22248  0.059514 16 -3.738322  0.0018
> Correlation:
>          (Intr)
> cultivate -0.962
> 
> Standardized Within-Group Residuals:
>       Min         Q1        Med         Q3        Max
> -1.8576001 -0.5935824  0.0859121  0.7933137  1.9665311
> 
> Number of Observations: 26
> Number of Groups: 9
> 
> I can see that the df has been calculated as 26 sites - 9 clusters - 1 = 16.
> However, this is clearly a misrepresentation of the data structure in the
> model and hence so is the p value. Using some sort of correction (i.e.
> Kenward-Roger correction), even if not precise would offer a slightly better
> solution in my circumstance than not correcting at all there fore I was
> wondering if anybody has addressed this in R or if people are just moving
> more and more away from reporting df and p values.
> 

  It's not entirely clear to me what the structure of your data is.
Does cultivation status vary within clusters? (Presumably, since it's a
continuous variable.)  On the basis of classic experimental design/ANOVA
rules, what do you think the df should be, or what is a reasonable
approximation?
  You can read about the rules that lme uses to guess denominator df
("inner-outer method" in PB 2000 (page 91 at
<http://books.google.com/books?id=y54QDUTmvDcC&printsec=frontcover&dq=pinheiro+bates&hl=en&src=bmrr&ei=_ZqkTaaOKMeB0QGWhcnBBQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCgQ6AEwAA#v=onepage&q=inner&f=false>):

 if cultivate varies within cluster, its value is "inner" to cluster
"If a term is inner to all Q grouping factors in a model, it is
estimated at the level of the within-group errors, which we denote as
the Q+1st level" -- so cultivate would be estimated at level 2.

  Letting m_i denote the number of groups in level i (with the
convention that m_0=1 when the fixed effect model includes an intercept
and 0 otherwise, and m_{Q+1}=N) and p_i the sum of the degrees of
freedom corresponding to the terms estimated at level i, the i^th level
denominator df is defined as

denDF_i = m_i-(m_{i-1}+p_i), i = 1 ... Q+1

  So in this case

 m_0 = 1
 denDF_2 = m_2 - (m_1 + p_1) = N - (# clusters + # parameters) =
26 - (9 + 1)

 exactly as you have suggested above.

 Can you explain further why this is wrong? (I'm not saying it isn't,
just that I don't understand the problem ...)

  Ben Bolker