[R-sig-ME] Fwd: Assigning random effects

Tue Jan 8 04:17:13 CET 2013

Nathan, 

I am no mixed model master but a humble R-using ecologist. Nevertheless I
will make a few observations regarding your question and allow the true
masters to correct me if I err. The first observation is that it takes a lot
of data points in order to estimate a random effect properly, far more that
it would appear you have for most of your effects. Think of estimating a
random effect like estimating variance for a group of data points. You can't
do a very good job estimating variance for two data points. Therefore, I
don't think it is desirable to treat your two blocks or three gaps as random
effects. As for transects, you only have one transect for each combination
of block X treatment (Ok, you have two transects for one treatment in block
B), so there's no way transect can be a random effect. On the other hand,
you probably have enough observations on different individuals at each
combination of block X treatment to make a decent estimate of variance. A
model like 'lmer(response ~ vegtreat * gappos2 + (Block|TreeId))' might be a
good place to start, though I make no guarantees here, particularly since I
am more familiar with nlme, being somewhat of a beginner myself with mixed
models. Perhaps an even better place to start would be to take the mean
response at each combination of Treatment X GapPosition X Block, and play
with the simple linear model lm(Treatment * gappos2).  

Good luck,
-Seth W. Bigelow

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Nathan Eustis
Rutenbeck
Sent: Monday, January 07, 2013 12:14 PM
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Fwd: Assigning random effects

My apologies, I am forwarding this corrected message:

Hi mixed model masters,

Forgive what may be a relatively elementary, yet overly detailed question
regarding model specification. I am attempting to use lmer() to model
photosynthetic gas exchange data for two tree species, white spruce and
trembling aspen across forest gap positions and for two vegetation
treatments. The experimental design is a replicated block design as
follows:

-In each of two blocks (A and B) there are four experimental harvest gaps.
Three gaps were used for this study, one in the B block and two in the A
block.

-Within each gap there are nine transects. Five of these transects were
chosen for this study, three in the block B gap and one in each of the gap A
blocks. One of two vegetation management treatments were randomly assigned
to each transect, a release and a sprout treatment. As I understand it,
there are therefore replications of vegetation treatment within each block,
but not within each gap.

-Within each transect, four positions within the gap were delineated
(understory, southern edge, center, northern edge) and at each gap position
6 individuals of each species were sampled. Because of data logging errors,
however, in actuality there are between 2 and 12 samples at each gap
position (originally there were five gap positions, but two were combined in
order to avoid cell-values of zero in the experimental design).

-For each individual, a series of gas exchange measurements were made (net
assimilation, evapotranspiration, and water-use efficiency) at two different
light levels. Note that I transformed the response variables in order to
improve normality of residual error.

I am therefore building a series of models - for each gas exchange
measurement, for each of the two species, for each of the two light levels.
For each model, I wish to perform hypothesis tests on the significance of
fixed-effects *vegetation treatment *and *gap position *for the mean of the
response variable (e.g. net assimilation). My question regards the correct
method of assigning random effects within these models. From my perspective,
there are several levels of random effects that could potentially be
included: block, gap, transect, gap position, and individual tree, since
each of these levels represents some component of additional environmental
or genetic variability that I am not modelling explicitly with the
fixed-effects. An example of what I have been doing follows:

>(fm1<-lmer(log(photo)~vegtrt+gappos2+(1|block),data=subset(m1200,specie
>s=="PIGL")))
>(fm2<-lmer(log(photo)~vegtrt+gappos2+(1|block/gap),data=subset(m1200,sp
>ecies=="PIGL")))
>(fm3<-lmer(log(photo)~vegtrt+gappos2+
(1|block/gap/transect/gappos2),data=subset(m1200,species=="PIGL")))
>(fm4<-lmer(log(photo)~vegtrt+gappos2+(1|block/gap/transect/gappos2/ind)
>,data=subset(m1200,species=="PIGL")))
>anova(fm1,fm2,fm3,fm4)

Which generates this output:

Models:
fm1: log(photo) ~ vegtrt + gappos2 + (1 | block)
fm2: log(photo) ~ vegtrt + gappos2 + (1 | block/gap)
fm3: log(photo) ~ vegtrt + gappos2 + (1 | block/gap/transect/gappos2)
fm4: log(photo) ~ vegtrt + gappos2 + (1 | block/gap/transect/gappos2/ind)
           Df    AIC         BIC          logLik       Chisq      Chi Df
 Pr(>Chisq)
fm1     7     137.10    155.62    -61.552
fm2     8     122.03    143.19    -53.016    17.072     1
 3.598e-05
fm3    10    126.81    153.25    -53.406     0.000      2            1
fm4    11    128.81    157.90    -53.406     0.000      1            1

As I understand, based on AIC, BIC, and log-likelihood tests, the best model
for this particular response variable is the fm2 object. When I call the
summary() function, however, I get the following output for the random
effects portion:

...
Random effects:
 Groups         Name           Variance        Std.Dev.
 gap:block     (Intercept)    1.7856e-01    4.2257e-01
 block            (Intercept)    2.7164e-18    1.6482e-09
 Residual                           1.5357e-01    3.9188e-01
Number of obs: 104, groups: gap:block, 3; block, 2 ...

The variance components are so small for the random effects (and the sd for
them so large in comparison), though, that it makes me wonder if this is
not, in actuality, over-fitting the model and there is some other
(superior) method for deciding which random effects to include in a designed
experiment like this. Does the structure of the replication make a
difference to how I must specify the random effects within the model (i.e.,
can I even include 'gap' because it is just an arbitrary, unreplicated
division of the block level)? I also wonder if I should perform similar
model selection procedures for each of my (many) models, or decide on a
sensible random effects structure for all of them and stick with it.

I also, like many people, am still wondering how to properly assess and
report the significance of the fixed effects for these models
(log-likelihood tests versus a null or alternative model?) and whether
multiple comparisons using glht() are valid for these objects given Douglas
Bates' comments regarding denominator degrees of freedom, but am more than
happy to leave these questions aside for now.

Thanks for your time,

-Nathan

--
*Nathan E. Rutenbeck*
PhD Student
Silviculture and Forest Ecology

231 Nutting Hall
School of Forest Resources
University of Maine

207.664.9581 (cell)

--
*Nathan E. Rutenbeck*
Research Assistant
Silviculture and Forest Ecology

231 Nutting Hall
School of Forest Resources
University of Maine

207.664.9581 (cell)

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models