Thanks for the reply, Seth. I guess I am still misunderstanding a few
things. I perceive my data (I admit perhaps wrongly...) as a more or less
classical, if unbalanced, hierarchical set of observations, with transect
being the replication of the fixed-effect vegetation treatment at the block
level, and with individual being the replication of the fixed-effect gap
position at the transect level. Does it matter in this case whether there
happens to be only two blocks, or that each vegetation treatment is
replicated once per block? I guess I thought that including these factors
as random effects would simply allow the response to fixed effects to
obtain a different intercept/slope at each level of the data (block,
transect, etc), and also allow for inference to the larger population (of
forest gaps) as opposed to purely within the context of the experiment. I
don't really care about estimating the random effects per se, but care
about them in the context of making realistic hypothesis tests about
fixed-effects.
In this way, I don't really understand how these data differ from examples
given in, say, Pinheiro and Bates (2000). Take, for example, the Machines
data in the lme() library, and which Pinheiro and Bates discuss at some
length. In the example, the random effects are modeled as machine within
worker, and the fixed effect tested is that of machine on productivity
score. It seemed logical (to me) to extend a similar model to my own data,
modeling random effects as individual within gap position within transect
within block (ignoring gap within block for reasons cited below). I tried a
bunch of different random effects structures and compared them in order to
gain a better understanding of how model fitting works, and became confused
at the results, thereby exposing my underlying ignorance!
As a final note, the reason that I have avoided using lm() or anova() to
analyze these data is my understanding that they are methods appropriate
only for strictly balanced data, which these are clearly not. I was
originally using lme(), given some recommendations on this list about
documentation, etc., but eventually decided that biting the bullet and
learning how to use lmer() would save me time in the long run.
Thanks again for your time,
Nathan
On Mon, Jan 7, 2013 at 10:17 PM, Seth W. Bigelow wrote:
> Nathan,
>
> I am no mixed model master but a humble R-using ecologist. Nevertheless I
> will make a few observations regarding your question and allow the true
> masters to correct me if I err. The first observation is that it takes a
> lot
> of data points in order to estimate a random effect properly, far more that
> it would appear you have for most of your effects. Think of estimating a
> random effect like estimating variance for a group of data points. You
> can't
> do a very good job estimating variance for two data points. Therefore, I
> don't think it is desirable to treat your two blocks or three gaps as
> random
> effects. As for transects, you only have one transect for each combination
> of block X treatment (Ok, you have two transects for one treatment in block
> B), so there's no way transect can be a random effect. On the other hand,
> you probably have enough observations on different individuals at each
> combination of block X treatment to make a decent estimate of variance. A
> model like 'lmer(response ~ vegtreat * gappos2 + (Block|TreeId))' might be
> a
> good place to start, though I make no guarantees here, particularly since I
> am more familiar with nlme, being somewhat of a beginner myself with mixed
> models. Perhaps an even better place to start would be to take the mean
> response at each combination of Treatment X GapPosition X Block, and play
> with the simple linear model lm(Treatment * gappos2).
>
> Good luck,
> -Seth W. Bigelow
>
>
>
>
>
>
>
> -----Original Message-----
> From: r-sig-mixed-models-bounces@r-project.org
> [mailto:r-sig-mixed-models-bounces@r-project.org] On Behalf Of Nathan
> Eustis
> Rutenbeck
> Sent: Monday, January 07, 2013 12:14 PM
> To: r-sig-mixed-models@r-project.org
> Subject: [R-sig-ME] Fwd: Assigning random effects
>
> My apologies, I am forwarding this corrected message:
>
>
> Hi mixed model masters,
>
> Forgive what may be a relatively elementary, yet overly detailed question
> regarding model specification. I am attempting to use lmer() to model
> photosynthetic gas exchange data for two tree species, white spruce and
> trembling aspen across forest gap positions and for two vegetation
> treatments. The experimental design is a replicated block design as
> follows:
>
> -In each of two blocks (A and B) there are four experimental harvest gaps.
> Three gaps were used for this study, one in the B block and two in the A
> block.
>
> -Within each gap there are nine transects. Five of these transects were
> chosen for this study, three in the block B gap and one in each of the gap
> A
> blocks. One of two vegetation management treatments were randomly assigned
> to each transect, a release and a sprout treatment. As I understand it,
> there are therefore replications of vegetation treatment within each block,
> but not within each gap.
>
> -Within each transect, four positions within the gap were delineated
> (understory, southern edge, center, northern edge) and at each gap position
> 6 individuals of each species were sampled. Because of data logging errors,
> however, in actuality there are between 2 and 12 samples at each gap
> position (originally there were five gap positions, but two were combined
> in
> order to avoid cell-values of zero in the experimental design).
>
> -For each individual, a series of gas exchange measurements were made (net
> assimilation, evapotranspiration, and water-use efficiency) at two
> different
> light levels. Note that I transformed the response variables in order to
> improve normality of residual error.
>
> I am therefore building a series of models - for each gas exchange
> measurement, for each of the two species, for each of the two light levels.
> For each model, I wish to perform hypothesis tests on the significance of
> fixed-effects *vegetation treatment *and *gap position *for the mean of the
> response variable (e.g. net assimilation). My question regards the correct
> method of assigning random effects within these models. From my
> perspective,
> there are several levels of random effects that could potentially be
> included: block, gap, transect, gap position, and individual tree, since
> each of these levels represents some component of additional environmental
> or genetic variability that I am not modelling explicitly with the
> fixed-effects. An example of what I have been doing follows:
>
> >(fm1<-lmer(log(photo)~vegtrt+gappos2+(1|block),data=subset(m1200,specie
> >s=="PIGL")))
> >(fm2<-lmer(log(photo)~vegtrt+gappos2+(1|block/gap),data=subset(m1200,sp
> >ecies=="PIGL")))
> >(fm3<-lmer(log(photo)~vegtrt+gappos2+
> (1|block/gap/transect/gappos2),data=subset(m1200,species=="PIGL")))
> >(fm4<-lmer(log(photo)~vegtrt+gappos2+(1|block/gap/transect/gappos2/ind)
> >,data=subset(m1200,species=="PIGL")))
> >anova(fm1,fm2,fm3,fm4)
>
>
> Which generates this output:
>
> Models:
> fm1: log(photo) ~ vegtrt + gappos2 + (1 | block)
> fm2: log(photo) ~ vegtrt + gappos2 + (1 | block/gap)
> fm3: log(photo) ~ vegtrt + gappos2 + (1 | block/gap/transect/gappos2)
> fm4: log(photo) ~ vegtrt + gappos2 + (1 | block/gap/transect/gappos2/ind)
> Df AIC BIC logLik Chisq Chi Df
> Pr(>Chisq)
> fm1 7 137.10 155.62 -61.552
> fm2 8 122.03 143.19 -53.016 17.072 1
> 3.598e-05
> fm3 10 126.81 153.25 -53.406 0.000 2 1
> fm4 11 128.81 157.90 -53.406 0.000 1 1
>
>
> As I understand, based on AIC, BIC, and log-likelihood tests, the best
> model
> for this particular response variable is the fm2 object. When I call the
> summary() function, however, I get the following output for the random
> effects portion:
>
> ...
> Random effects:
> Groups Name Variance Std.Dev.
> gap:block (Intercept) 1.7856e-01 4.2257e-01
> block (Intercept) 2.7164e-18 1.6482e-09
> Residual 1.5357e-01 3.9188e-01
> Number of obs: 104, groups: gap:block, 3; block, 2 ...
>
> The variance components are so small for the random effects (and the sd for
> them so large in comparison), though, that it makes me wonder if this is
> not, in actuality, over-fitting the model and there is some other
> (superior) method for deciding which random effects to include in a
> designed
> experiment like this. Does the structure of the replication make a
> difference to how I must specify the random effects within the model (i.e.,
> can I even include 'gap' because it is just an arbitrary, unreplicated
> division of the block level)? I also wonder if I should perform similar
> model selection procedures for each of my (many) models, or decide on a
> sensible random effects structure for all of them and stick with it.
>
> I also, like many people, am still wondering how to properly assess and
> report the significance of the fixed effects for these models
> (log-likelihood tests versus a null or alternative model?) and whether
> multiple comparisons using glht() are valid for these objects given Douglas
> Bates' comments regarding denominator degrees of freedom, but am more than
> happy to leave these questions aside for now.
>
> Thanks for your time,
>
> -Nathan
>
> --
> *Nathan E. Rutenbeck*
> PhD Student
> Silviculture and Forest Ecology
>
> 231 Nutting Hall
> School of Forest Resources
> University of Maine
>
> 207.664.9581 (cell)
>
>
>
> --
> *Nathan E. Rutenbeck*
> Research Assistant
> Silviculture and Forest Ecology
>
> 231 Nutting Hall
> School of Forest Resources
> University of Maine
>
> 207.664.9581 (cell)
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
--
*Nathan E. Rutenbeck*
PhD Student
Silviculture and Forest Ecology
231 Nutting Hall
School of Forest Resources
University of Maine
207.664.9581 (cell)
[[alternative HTML version deleted]]