[R-sig-phylo] possible issue when incorporating a phylogenetic correlation structure (corPagel) in a linear mixed effect model (lme)

Simon Blomberg s.blomberg1 at uq.edu.au
Thu Mar 1 04:26:34 CET 2012


Can you provide a data frame and tree which reproduces the problem? The 
following code throws an error for both fit3 and fit4. I can't reproduce 
your non-error! Also, you may want to update your nlme to the current 
version (3.1-103).

  library(ape)
  library(nlme)
  library(MASS)
  set.seed(123)
  tr <- compute.brlen(rtree(446))
  y <- mvrnorm(1, Sigma=vcv(tr), mu=rep(0,446))
  dat <- data.frame(y=y, x=factor(sample(c("N","Y"), 446, 
replace=TRUE)), date=factor(sample(1:7, 446, replace=TRUE)))
  rownames(dat) <- tr$tip.label

  str(dat)
'data.frame':    446 obs. of  3 variables:
  $ y   : num  -0.708 -0.686 -0.791 -0.808 -0.747 ...
  $ x   : Factor w/ 2 levels "N","Y": 2 1 1 2 2 1 2 1 1 1 ...
  $ date: Factor w/ 7 levels "1","2","3","4",..: 3 2 3 2 1 6 4 7 5 6 ...

  fit1 <- lme(y~x, random=~1|date, data=dat)
  fit2 <- lme(y~x, random=list(date=pdIdent(form=~date-1)), data=dat)

  fit3 <- lme(y~x, random=~1|date, correlation=corPagel(.5, tr), data=dat)

Error in corFactor.corStruct(object) :
   NA/NaN/Inf in foreign function call (arg 1)

  fit4 <- lme(y~x, random=list(date=pdIdent(form=~date-1)), data=dat,  
correlation=corPagel(.5, tr))

Error in corFactor.corStruct(object) :
   NA/NaN/Inf in foreign function call (arg 1)


Cheers,

Simon.

On 01/03/12 00:47, Rudolf Philippe Rohr wrote:
> Hello.
>
> I'm writing about a possible issue when incorporating a phylogenetic 
> correlation structure (corPagel) in a linear mixed effect model (lme).
>
> There are two techniques for incorporating a random effect in a linear 
> model:
>
> t1 (it is the traditional way): m1 <- lme( y ~ x, random = ~1|date)
>
> t2: u = gl(1,1,length(y))
> m2 <- lme(y ~ x, random = list(u = pdIdent(form=~factor(date)-1)))
>
> (http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg11749.html)
>
> > str(d)
> 'data.frame': 446 obs. of 3 variables:
> $ y : num 2.197 0.693 0.693 0 0.693 ...
> $ x : Factor w/ 2 levels "N","Y": 2 1 1 1 1 2 2 2 2 2 ...
> $ date: Factor w/ 7 levels "12","16","21",..: 5 6 7 4 7 6 2 7 3 6 ...
>
> These two techniques have to give the same output, and it is exactly 
> what happens (same parameter estimations, same log-like, same 
> p-values, same values for the random factor, …)
>
> > summary(m1)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1366.617 1383 -679.3083
>
> Random effects:
> Formula: ~1 | date
> (Intercept) Residual
> StdDev: 0.4283086 1.086683
>
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.4430143 0.1856287 438 7.773658 0.0000
> xY 0.0700834 0.1119626 438 0.625953 0.5317
> Correlation:
> (Intr)
> xY -0.397
>
> > summary(m2)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1366.617 1383 -679.3083
>
> Random effects:
> Formula: ~factor(date) - 1 | u
> Structure: Multiple of an Identity
> factor(date)12 factor(date)16 factor(date)21 factor(date)26 
> factor(date)30 factor(date)35 factor(date)38
> StdDev: 0.4283086 0.4283086 0.4283086 0.4283086 0.4283086 0.4283086 
> 0.4283086
> Residual
> StdDev: 1.086683
>
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.4430143 0.1856287 444 7.773658 0.0000
> xY 0.0700834 0.1119626 444 0.625953 0.5317
> Correlation:
> (Intr)
> xY -0.397
>
>
> Things get strange when incorporating a phylogenetic correlation 
> structure with corPagel (and also with corGrafen):
>
> t1: m3 <- lme( y ~ x, random = ~1|date, correlation = 
> corPagel(0.5,tree,fixed=FALSE))
>
> t2: u = gl(1,1,length(y))
> m4 <- lme(y ~ x, random = list(u = pdIdent(form=~factor(date)-1)), 
> correlation = corPagel(0.5,tree,fixed=FALSE)))
>
> Again, these two methods should give the same output, however here:
>
> t1 always gives the following error message:
>
> Error in corFactor.corStruct(object) :
> NA/NaN/Inf in foreign function call (arg 1)
>
> t2: the optimization process converges, and gives reasonable output.
>
> > summary(m4)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 1350.275 1370.754 -670.1376
>
> Random effects:
> Formula: ~factor(date) - 1 | u
> Structure: Multiple of an Identity
> factor(date)12 factor(date)16 factor(date)21 factor(date)26 
> factor(date)30 factor(date)35 factor(date)38
> StdDev: 0.3966233 0.3966233 0.3966233 0.3966233 0.3966233 0.3966233 
> 0.3966233
> Residual
> StdDev: 1.141081
>
> Correlation Structure: corPagel
> Formula: ~1 | u
> Parameter estimate(s):
> lambda
> 0.2846964
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) 1.1950145 0.3493305 444 3.420871 0.0007
> xY -0.0312983 0.1139128 444 -0.274757 0.7836
> Correlation:
> (Intr)
> xY -0.18
>
> First question: why is it that t1 does not work, while t2 does?
>
>
> To go a step forward with this problem, we can set the lambda value in 
> t1 with the value obtained from t2.
>
> t1bis: m3bis <-lme( y ~ x, random = ~1|date, correlation = 
> corPagel(0.2846964,tree,fixed=TRUE))
>
> This time, t1bis converges. However the output is completely different 
> from t2.
>
> > summary(m3bis)
> Linear mixed-effects model fit by REML
> Data: d
> AIC BIC logLik
> 14907.8 14924.19 -7449.902
>
> Random effects:
> Formula: ~1 | date
> (Intercept) Residual
> StdDev: 809763.6 4632624
>
> Correlation Structure: corPagel
> Formula: ~1 | date
> Parameter estimate(s):
> lambda
> 0.2846964
> Fixed effects: y ~ x
> Value Std.Error DF t-value p-value
> (Intercept) -329242.7 324363.4 438 -1.01504 0.3106
> xY 3.7 0.0 438 127.43249 0.0000
> Correlation:
> (Intr)
> xY 0.014
>
> Second question: how is that possible? The two outputs should be the 
> same.
>
>
> To try to understand what is going on, we can compute the profile 
> log-likelihood curve, as a function of lambda, for both techniques.
>
> lambda <- seq(0,1,0.01)
>
> loglike.t1 <- rep(NA,length(lambda))
> loglike.t2 <- rep(NA,length(lambda))
>
> for (i in 1:length(lambda)){
>
> m1 <- lme(y ~ x, random = ~1|date, correlation = 
> corPagel(lambda[i],tree,fixed=TRUE))
> loglike.t1[i] <- m1$logLik
>
> u = gl(1,1,length(d$y))
> m2 <- lme(y ~ x,random = list(u = pdIdent(form=~factor(date)-1)), 
> correlation = corPagel(lambda[i],tree,fixed=TRUE))
> loglike.t2[i] <- m2$logLik
>
> }
>
> The two curves are completely different:
>
> With t2, we obtain a reasonable curve, with a maximum at the 
> previously estimated lambda value.
>
> With t1, the curve looks really strange: there is a discontinuity at 
> the origin, i.e., for lambda=0 the log-like value is higher than for 
> lambda>0, and for lambda>0 the log-like is only increasing with lambda.
>
> Thus a third question: why are these two profile log-likelihood curves 
> different?
>
> The final question is: in which technique can we believe?
>
> I’m using the version 2.14.1 of R, 3.1-96 of nlme, and 3.0-1 of ape.
>
> Best,
> Rudolf
>

-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat.
Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au
http://www.uq.edu.au/~uqsblomb/

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

Statistics is the grammar of science - Karl Pearson.



More information about the R-sig-phylo mailing list