[R] question in using nlme and lme4 for unbalanced data

Mon Oct 25 21:41:29 CEST 2010

Chi:

I would say that these are not really R questions, but about
statistical procedures in general. However, there are some important
issues that you touch on, so I cannot resist commenting. Hopefully my
comments might pique others to add theirs -- and perhaps to disagree
with me.

1. Unbalanced data?
Yes that's what they're for. However, you need to  recognize that with
unbalanced data, interpretability of coefficients may be lost (the
estimates depend on what factors you include in your model).

2.Proper analysis?
TRUST in DOUG. (Doug Bates, the author of the packages and an
acknowledged expert in the field).

3. Without P values, how do I I tell what's significant?
Well, first of all, P values/statistical significance are only
meaningful when hypotheses are pre-specified; and even then,
statistical significance is a function of effect magnitude, sample
size, and experimental noise, so usually doesn't accord with
scientific importance anyway unless the experiments have been powered
appropriately, which is rarely the case outside clinical trials. So
statistical significance is probably irrelevant to the scientific
questions iin any case.

I would say that you need to focus on effect (coefficient) sizes and
perhaps do sensitivity analyses to see how much predicted results
change as you change them. Good graphs of your data are also always a
good idea.

4, With only 5 blocks, you have practically no information to estimate
variance anyway. You're probably better off treating them as fixed
effects and just estimating the model via lm. You might find a sqrt or
log transformation of the enfa numbers to be useful, though...

Cheers,
Bert

On Mon, Oct 25, 2010 at 11:44 AM, Chi Yuan <cyuan at email.arizona.edu> wrote:
> Hello:
>  I have an two factorial random block design. It's a ecology
> experiment. My two factors are, guild removal and enfa removal. Both
> are two levels, 0 (no removal), 1 (removal). I have 5 blocks. But
> within each block, it's unbalanced at plot level because I have 5
> plots instead of 4 in each block. Within each block, I have 1 plot
> with only guild removal, 1 plot with only enfa removal, 1 plot for
> control with no removal, 2 plots for both guild and enfa removal. I am
> looking at how these treatment affect the enfa mortality rate. I
> decide to use mixed model to treat block as random effect. So I try
> both nlme and lme4. But I don't know whether they take the unbalanced
> data properly. So my question is, does lme in nlme and lmer in lme4
> take unbalanced data? How do I know it's analysis in a proper way?
> Here is my code and the result for each method.
>  I first try nlme
>  library(nlme)
>  m=lme(enfa_mortality~guild_removal*enfa_removal,random=~1|block,data=com_summer)
>  It gave me the result as following
>  Linear mixed-effects model fit by REML
>  Data: com_summer
>       AIC      BIC   logLik
>  8.552254 14.81939 1.723873
>
> Random effects:
>  Formula: ~1 | block
>         (Intercept)  Residual
> StdDev: 9.722548e-07 0.1880945
>
> Fixed effects: enfa_mortality ~ guild_removal * enfa_removal
>                            Value Std.Error DF   t-value p-value
> (Intercept)                 0.450 0.0841184 17  5.349603  0.0001
> guild_removal              -0.100 0.1189614 17 -0.840609  0.4122
> enfa_removal               -0.368 0.1189614 17 -3.093441  0.0066
> guild_removal:enfa_removal  0.197 0.1573711 17  1.251818  0.2276
>  Correlation:
>                           (Intr) gld_rm enf_rm
> guild_removal              -0.707
> enfa_removal               -0.707  0.500
> guild_removal:enfa_removal  0.535 -0.756 -0.756
>
> Standardized Within-Group Residuals:
>       Min         Q1        Med         Q3        Max
> -1.7650706 -0.7017751  0.1594943  0.7974717  1.9139320
>
> Number of Observations: 25
> Number of Groups: 5
>
> I kind of heard the P value does not matter that much in the mixed
> model. Is there any other way I can tell whether the treatment has a
> significant effect or not?
>  I then try lme4, it give similar result, but won't tell me the p value.
> library(lme4)
> m<-lmer(enfa_mortality ~ guild_removal*enfa_removal +(1|block), data=com_summer)
> here is the result
>  Linear mixed model fit by REML
> Formula: enfa_mortality ~ guild_removal * enfa_removal + (1 | block)
>   Data: com_summer
>   AIC   BIC logLik deviance REMLdev
>  8.552 15.87  1.724   -16.95  -3.448
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  block    (Intercept) 0.000000 0.00000
>  Residual             0.035380 0.18809
> Number of obs: 25, groups: block, 5
>
> Fixed effects:
>                           Estimate Std. Error t value
> (Intercept)                 0.45000    0.08412   5.350
> guild_removal              -0.10000    0.11896  -0.841
> enfa_removal               -0.36800    0.11896  -3.093
> guild_removal:enfa_removal  0.19700    0.15737   1.252
>
> Correlation of Fixed Effects:
>            (Intr) gld_rm enf_rm
> guild_remvl -0.707
> enfa_removl -0.707  0.500
> gld_rmvl:n_  0.535 -0.756 -0.756
>
>
> I really appreciate any suggestion!
> Thank you!
>
> --
> Chi Yuan
> Graduate Student
> Department of Ecology and Evolutionary Biology
> University of Arizona
> Room 106 Bioscience West
> lab phone: 520-621-1889
> Email:cyuan at email.arizona.edu
> Website: http://www.u.arizona.edu/~cyuan/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Bert Gunter
Genentech Nonclinical Biostatistics