[R-sig-eco] nlme model specification

Fri May 23 20:39:38 CEST 2008

Dear Thierry, Chris, Peter, and the list,

Based on these suggestions and some others, I realize now that tree id is clearly the grouping variable.  A couple people had mentioned in a previous letter that there is likely to be temporal autocorrelation within trees, and I am particularly interested in this aspect of the data, which prompted me to look into the autocorrelation structures described in Pinheiro and Bates.

I am still a little confused about the two different approaches for accounting for non-independence within trees among years (i.e. specifying tree.id as a random effect vs. specifying autocorrelation in the residuals.  I am currently considering these alternative models:

f.gls <- gls(growth ~ diameter*vines, corr = corAR1(form = ~ yr|tree.id)

f.lme <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id)

f.lme.ar1 <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id, correlation = corAR1(form = ~ yr | tree.id))

For f.gls, phi is 0.67, which pretty closely matches the value I determined empirically by correlating residuals from different years.
plot(ACF(f.gls)) also reveals highly signficant autocorrelation which decays smoothly with lag.  This approach is very appealing, because it closely matches what I see if I make a correlation matrix among years empirically from the residuals.

However, f.lme.ar1 also has significant correlation, and a lower AIC than f.gls (and f.lme).  But now phi is only 0.26, and plot(ACF(f.lme.ar1)) does not look like AR1, but is rather increasingly negative with longer lags.  I don't quite understand this change in phi, or the biological interpretation of the correlation structure.

So, even though f.gls has a higher AIC, I'd like to use that one because it seems more biologically interpretable.  Does this adequately account for the pseudoreplication across years, or do I really need that random effect of tree.id in there?

Again, many many thanks for all your help.  This makes a lot more sense to me now than it did.

Matt

>-----Original Message-----
>From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be]
>Sent: Thursday, May 22, 2008 11:15 AM
>To: Christian A. Parker; Landis, R Matthew
>Cc: r-sig-ecology at r-project.org
>Subject: RE: [R-sig-eco] nlme model specification
>
>Matthew,
>
>I support Chris' point of view. The random effects should model the
>dependence structure between the measurements. The parameters of your
>fixed effect will be better when you have specified the random effect
>correctly. Define the most complex model that are willing to
>accept, run
>it with different random effects and compare those models (by LRT or
>AIC, both under REML). Take the best one and refine your fixed effects
>(under ML).
>
>A few suggestions:
>
>M0 <- gls(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, method = 'REML')
>Ma <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|year,
>method = 'REML')
>M1 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|id,
>method = 'REML')
>M2 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>~year|id,
>method = 'REML')
>M3 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>~factor(year)|id, method = 'REML')
>M4 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random =
>list(id =
>pdCompSymm(~factor(year))), method = 'REML')
>
>anova(M0, Ma)
>anova(M0, M1, M2)
>anova(M0, M1, M3)
>anova(M0, M1, M4)
>
>Warning: M3 may take some time to calculate as it will have estimate 45
>parameters for the random effect.
>
>A random slope by year when you group by year makes no sense since you
>have in each group only data from one year. Hence it is impossible to
>have a slope by year in each group.
>
>Thierry
>
>
>---------------------------------------------------------------
>---------
>----
>ir. Thierry Onkelinx
>Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>and Forest
>Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
>methodology and quality assurance
>Gaverstraat 4
>9500 Geraardsbergen
>Belgium
>tel. + 32 54/436 185
>Thierry.Onkelinx at inbo.be
>www.inbo.be
>
>To call in the statistician after the experiment is done may be no more
>than asking him to perform a post-mortem examination: he may be able to
>say what the experiment died of.
>~ Sir Ronald Aylmer Fisher
>
>The plural of anecdote is not data.
>~ Roger Brinner
>
>The combination of some data and an aching desire for an
>answer does not
>ensure that a reasonable answer can be extracted from a given body of
>data.
>~ John Tukey
>
>-----Oorspronkelijk bericht-----
>Van: r-sig-ecology-bounces at r-project.org
>[mailto:r-sig-ecology-bounces at r-project.org] Namens Christian A. Parker
>Verzonden: donderdag 22 mei 2008 16:39
>Aan: Landis, R Matthew
>CC: 'r-sig-ecology at r-project.org'
>Onderwerp: Re: [R-sig-eco] nlme model specification
>
>Matthew
>Please correct me if I am wrong (anyone) but because your observations
>are not independent across your desired groups (years) your error terms
>will be biased which will then influence your significant tests. So
>regardless of the factor that you are interested you would
>still want to
>
>account for the fact that all measurements were taken on the same trees
>each year by doing a repeated measures model of some sort.
>Hope this helps,
>-Chris
>
>Landis, R Matthew wrote:
>> Dear R-sig-eco:
>>
>> Many thanks to all of those who took the time to reply to my
>question.
>The diversity of replies has made me go back and try to clarify my
>question.  Apologies for the length of the e-mail.  Thanks in
>advance to
>anyone willing to plow through this and understand it.  If you're ever
>in Middlebury I'll buy you a beer.
>>
>> To repeat, I have 300 trees, ranging in size from 10 - 150
>cm diameter
>(big trees).  To simplify my original question, let's say I want to
>understand the relationship between growth and two variables, diameter
>(continuous) and vine load (ordinal index from 1-4). I'd also like to
>know the relative importance of diameter vs. vine load, e.g. by partial
>R2.  If I had one year of data, this would be a simple regression.
>>
>> However, I have 9 years of annual measurements on the trees.  It's as
>if I have the above analysis repeated 9 times.  There was no initial
>treatment, so I view these 9 years as a random sample of the years in
>the life of the tree, and unlike most examples of repeated measures I
>have read, the time effect is of no interest whatsoever. That is, I am
>not interested in viewing xyplot(growth ~ time|id).  I don't expect to
>see any consistent directional response to time.  In a way, it's as if
>the 9 years represent blocks, (except that it's the same 300 trees in
>each block) -- this is why I view the yr as a random effect, and as the
>grouping variable.
>>
>> If I were to graph the data, I would use xyplot(growth ~ diameter|yr)
>to see what I am most interested in.  Grouping by individual doesn't
>make sense to me here because each individual only represents a very
>small slice of the full range of measurements - e.g. over the
>ten years,
>each tree only grows from 10 cm - 14 cm, so I can't really estimate the
>growth vs. diameter relationship for each tree.  xyplot(growth ~
>diameter|id) would not be useful. This is why I don't consider the
>individual to be the grouping variable, but perhaps I am wrong on this.
>>
>> So, now, as before, I am back to
>>
>> fit <- lme(fixed = growth ~ diameter * vines, random = ~ 1|year)
>>
>> I'm expecting that this will estimate separate intercepts for each
>year.  Which is what I want (I would like to fit separate
>slopes by year
>too, but that model didn't converge).
>>
>> I guess what I'm most concerned about is whether the significance
>tests obtained for each term use the appropriate error term and the
>appropriate degrees of freedom.  I'm currently using something like the
>following command to test the effect of diameter
>>
>> anova(fit.full.model, update(fit.full.model, . ~ vines))
>>
>> But maybe I'm way off base there.
>>
>> Thanks very much!
>>
>> Matt Landis

>