[R-sig-eco] nlme model specification
Landis, R Matthew
rlandis at middlebury.edu
Fri May 23 20:39:38 CEST 2008
Dear Thierry, Chris, Peter, and the list,
Based on these suggestions and some others, I realize now that tree id is clearly the grouping variable. A couple people had mentioned in a previous letter that there is likely to be temporal autocorrelation within trees, and I am particularly interested in this aspect of the data, which prompted me to look into the autocorrelation structures described in Pinheiro and Bates.
I am still a little confused about the two different approaches for accounting for non-independence within trees among years (i.e. specifying tree.id as a random effect vs. specifying autocorrelation in the residuals. I am currently considering these alternative models:
f.gls <- gls(growth ~ diameter*vines, corr = corAR1(form = ~ yr|tree.id)
f.lme <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id)
f.lme.ar1 <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id, correlation = corAR1(form = ~ yr | tree.id))
For f.gls, phi is 0.67, which pretty closely matches the value I determined empirically by correlating residuals from different years.
plot(ACF(f.gls)) also reveals highly signficant autocorrelation which decays smoothly with lag. This approach is very appealing, because it closely matches what I see if I make a correlation matrix among years empirically from the residuals.
However, f.lme.ar1 also has significant correlation, and a lower AIC than f.gls (and f.lme). But now phi is only 0.26, and plot(ACF(f.lme.ar1)) does not look like AR1, but is rather increasingly negative with longer lags. I don't quite understand this change in phi, or the biological interpretation of the correlation structure.
So, even though f.gls has a higher AIC, I'd like to use that one because it seems more biologically interpretable. Does this adequately account for the pseudoreplication across years, or do I really need that random effect of tree.id in there?
Again, many many thanks for all your help. This makes a lot more sense to me now than it did.
>From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be]
>Sent: Thursday, May 22, 2008 11:15 AM
>To: Christian A. Parker; Landis, R Matthew
>Cc: r-sig-ecology at r-project.org
>Subject: RE: [R-sig-eco] nlme model specification
>I support Chris' point of view. The random effects should model the
>dependence structure between the measurements. The parameters of your
>fixed effect will be better when you have specified the random effect
>correctly. Define the most complex model that are willing to
>it with different random effects and compare those models (by LRT or
>AIC, both under REML). Take the best one and refine your fixed effects
>A few suggestions:
>M0 <- gls(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, method = 'REML')
>Ma <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|year,
>method = 'REML')
>M1 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|id,
>method = 'REML')
>M2 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>method = 'REML')
>M3 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>~factor(year)|id, method = 'REML')
>M4 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>pdCompSymm(~factor(year))), method = 'REML')
>anova(M0, M1, M2)
>anova(M0, M1, M3)
>anova(M0, M1, M4)
>Warning: M3 may take some time to calculate as it will have estimate 45
>parameters for the random effect.
>A random slope by year when you group by year makes no sense since you
>have in each group only data from one year. Hence it is impossible to
>have a slope by year in each group.
>ir. Thierry Onkelinx
>Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
>methodology and quality assurance
>tel. + 32 54/436 185
>Thierry.Onkelinx at inbo.be
>To call in the statistician after the experiment is done may be no more
>than asking him to perform a post-mortem examination: he may be able to
>say what the experiment died of.
>~ Sir Ronald Aylmer Fisher
>The plural of anecdote is not data.
>~ Roger Brinner
>The combination of some data and an aching desire for an
>answer does not
>ensure that a reasonable answer can be extracted from a given body of
>~ John Tukey
>Van: r-sig-ecology-bounces at r-project.org
>[mailto:r-sig-ecology-bounces at r-project.org] Namens Christian A. Parker
>Verzonden: donderdag 22 mei 2008 16:39
>Aan: Landis, R Matthew
>CC: 'r-sig-ecology at r-project.org'
>Onderwerp: Re: [R-sig-eco] nlme model specification
>Please correct me if I am wrong (anyone) but because your observations
>are not independent across your desired groups (years) your error terms
>will be biased which will then influence your significant tests. So
>regardless of the factor that you are interested you would
>still want to
>account for the fact that all measurements were taken on the same trees
>each year by doing a repeated measures model of some sort.
>Hope this helps,
>Landis, R Matthew wrote:
>> Dear R-sig-eco:
>> Many thanks to all of those who took the time to reply to my
>The diversity of replies has made me go back and try to clarify my
>question. Apologies for the length of the e-mail. Thanks in
>anyone willing to plow through this and understand it. If you're ever
>in Middlebury I'll buy you a beer.
>> To repeat, I have 300 trees, ranging in size from 10 - 150
>(big trees). To simplify my original question, let's say I want to
>understand the relationship between growth and two variables, diameter
>(continuous) and vine load (ordinal index from 1-4). I'd also like to
>know the relative importance of diameter vs. vine load, e.g. by partial
>R2. If I had one year of data, this would be a simple regression.
>> However, I have 9 years of annual measurements on the trees. It's as
>if I have the above analysis repeated 9 times. There was no initial
>treatment, so I view these 9 years as a random sample of the years in
>the life of the tree, and unlike most examples of repeated measures I
>have read, the time effect is of no interest whatsoever. That is, I am
>not interested in viewing xyplot(growth ~ time|id). I don't expect to
>see any consistent directional response to time. In a way, it's as if
>the 9 years represent blocks, (except that it's the same 300 trees in
>each block) -- this is why I view the yr as a random effect, and as the
>> If I were to graph the data, I would use xyplot(growth ~ diameter|yr)
>to see what I am most interested in. Grouping by individual doesn't
>make sense to me here because each individual only represents a very
>small slice of the full range of measurements - e.g. over the
>each tree only grows from 10 cm - 14 cm, so I can't really estimate the
>growth vs. diameter relationship for each tree. xyplot(growth ~
>diameter|id) would not be useful. This is why I don't consider the
>individual to be the grouping variable, but perhaps I am wrong on this.
>> So, now, as before, I am back to
>> fit <- lme(fixed = growth ~ diameter * vines, random = ~ 1|year)
>> I'm expecting that this will estimate separate intercepts for each
>year. Which is what I want (I would like to fit separate
>slopes by year
>too, but that model didn't converge).
>> I guess what I'm most concerned about is whether the significance
>tests obtained for each term use the appropriate error term and the
>appropriate degrees of freedom. I'm currently using something like the
>following command to test the effect of diameter
>> anova(fit.full.model, update(fit.full.model, . ~ vines))
>> But maybe I'm way off base there.
>> Thanks very much!
>> Matt Landis
More information about the R-sig-ecology