[R-sig-ME] [R-sig-phylo] possible issue when incorporating a phylogenetic correlation structure (corPagel) in a linear mixed effect model (lme)

Simon Blomberg s.blomberg1 at uq.edu.au
Wed Mar 7 05:42:35 CET 2012


Hi,

I'm not sure why this is occurring (and I've cc'ed the r-sig-me list as 
they might know more). It appears to me that there is some problem with 
the parameterisation of the random effects, and how that interacts with 
the correlation structure. The usual random=~1|date parameterisation 
uses a log-Cholesky factorisation, which is different to the pdIdent 
parameterisation, but as you say in this case should give the same 
answers. It looks like there may be something going on deep in the lme 
internals. Maybe someone on r-sig-me can help.

Sorry I can't be of more help. It's an interesting problem and I would 
like to see it resolved too.

Best,

Simon.

On 01/03/12 00:47, Rudolf Philippe Rohr wrote:
> Hello.
>
> I'm writing about a possible issue when incorporating a phylogenetic 
> correlation structure (corPagel) in a linear mixed effect model (lme).
>
> There are two techniques for incorporating a random effect in a linear 
> model:
>
> t1 (it is the traditional way): m1 <- lme( y ~ x, random = ~1|date)
>
> t2: u = gl(1,1,length(y))
> m2 <- lme(y ~ x, random = list(u = pdIdent(form=~factor(date)-1)))
>
> (http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg11749.html)
>
> > str(d)
> 'data.frame': 446 obs. of 3 variables:
> $ y : num 2.197 0.693 0.693 0 0.693 ...
> $ x : Factor w/ 2 levels "N","Y": 2 1 1 1 1 2 2 2 2 2 ...
> $ date: Factor w/ 7 levels "12","16","21",..: 5 6 7 4 7 6 2 7 3 6 ...
>
> These two techniques have to give the same output, and it is exactly 
> what happens (same parameter estimations, same log-like, same 
> p-values, same values for the random factor, …)
>
> > summary(m1)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1366.617 1383 -679.3083
>
> Random effects:
> Formula: ~1 | date
> (Intercept) Residual
> StdDev: 0.4283086 1.086683
>
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.4430143 0.1856287 438 7.773658 0.0000
> xY 0.0700834 0.1119626 438 0.625953 0.5317
> Correlation:
> (Intr)
> xY -0.397
>
> > summary(m2)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1366.617 1383 -679.3083
>
> Random effects:
> Formula: ~factor(date) - 1 | u
> Structure: Multiple of an Identity
> factor(date)12 factor(date)16 factor(date)21 factor(date)26 
> factor(date)30 factor(date)35 factor(date)38
> StdDev: 0.4283086 0.4283086 0.4283086 0.4283086 0.4283086 0.4283086 
> 0.4283086
> Residual
> StdDev: 1.086683
>
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.4430143 0.1856287 444 7.773658 0.0000
> xY 0.0700834 0.1119626 444 0.625953 0.5317
> Correlation:
> (Intr)
> xY -0.397
>
>
> Things get strange when incorporating a phylogenetic correlation 
> structure with corPagel (and also with corGrafen):
>
> t1: m3 <- lme( y ~ x, random = ~1|date, correlation = 
> corPagel(0.5,tree,fixed=FALSE))
>
> t2: u = gl(1,1,length(y))
> m4 <- lme(y ~ x, random = list(u = pdIdent(form=~factor(date)-1)), 
> correlation = corPagel(0.5,tree,fixed=FALSE)))
>
> Again, these two methods should give the same output, however here:
>
> t1 always gives the following error message:
>
> Error in corFactor.corStruct(object) :
> NA/NaN/Inf in foreign function call (arg 1)
>
> t2: the optimization process converges, and gives reasonable output.
>
> > summary(m4)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1350.275 1370.754 -670.1376
>
> Random effects:
> Formula: ~factor(date) - 1 | u
> Structure: Multiple of an Identity
> factor(date)12 factor(date)16 factor(date)21 factor(date)26 
> factor(date)30 factor(date)35 factor(date)38
> StdDev: 0.3966233 0.3966233 0.3966233 0.3966233 0.3966233 0.3966233 
> 0.3966233
> Residual
> StdDev: 1.141081
>
> Correlation Structure: corPagel
> Formula: ~1 | u
> Parameter estimate(s):
> lambda
> 0.2846964
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.1950145 0.3493305 444 3.420871 0.0007
> xY -0.0312983 0.1139128 444 -0.274757 0.7836
> Correlation:
> (Intr)
> xY -0.18
>
> First question: why is it that t1 does not work, while t2 does?
>
>
> To go a step forward with this problem, we can set the lambda value in 
> t1 with the value obtained from t2.
>
> t1bis: m3bis <-lme( y ~ x, random = ~1|date, correlation = 
> corPagel(0.2846964,tree,fixed=TRUE))
>
> This time, t1bis converges. However the output is completely different 
> from t2.
>
> > summary(m3bis)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 14907.8 14924.19 -7449.902
>
> Random effects:
> Formula: ~1 | date
> (Intercept) Residual
> StdDev: 809763.6 4632624
>
> Correlation Structure: corPagel
> Formula: ~1 | date
> Parameter estimate(s):
> lambda
> 0.2846964
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) -329242.7 324363.4 438 -1.01504 0.3106
> xY 3.7 0.0 438 127.43249 0.0000
> Correlation:
> (Intr)
> xY 0.014
>
> Second question: how is that possible? The two outputs should be the 
> same.
>
>
> To try to understand what is going on, we can compute the profile 
> log-likelihood curve, as a function of lambda, for both techniques.
>
> lambda <- seq(0,1,0.01)
>
> loglike.t1 <- rep(NA,length(lambda))
> loglike.t2 <- rep(NA,length(lambda))
>
> for (i in 1:length(lambda)){
>
> m1 <- lme(y ~ x, random = ~1|date, correlation = 
> corPagel(lambda[i],tree,fixed=TRUE))
> loglike.t1[i] <- m1$logLik
>
> u = gl(1,1,length(d$y))
> m2 <- lme(y ~ x,random = list(u = pdIdent(form=~factor(date)-1)), 
> correlation = corPagel(lambda[i],tree,fixed=TRUE))
> loglike.t2[i] <- m2$logLik
>
> }
>
> The two curves are completely different:
>
> With t2, we obtain a reasonable curve, with a maximum at the 
> previously estimated lambda value.
>
> With t1, the curve looks really strange: there is a discontinuity at 
> the origin, i.e., for lambda=0 the log-like value is higher than for 
> lambda>0, and for lambda>0 the log-like is only increasing with lambda.
>
> Thus a third question: why are these two profile log-likelihood curves 
> different?
>
> The final question is: in which technique can we believe?
>
> I’m using the version 2.14.1 of R, 3.1-96 of nlme, and 3.0-1 of ape.
>
> Best,
> Rudolf
>

-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au
http://www.uq.edu.au/~uqsblomb/

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

Statistics is the grammar of science - Karl Pearson.




More information about the R-sig-mixed-models mailing list