[R] Some questions to GLMM

Tue Nov 9 12:19:01 CET 2004

Hello all R-user

I am relative new to the R-environment and also to GLMM, so please don't be 
irritated if some questions don't make sense.
I am using R 2.0.0 on Windows 2000.

I investigated the occurrence of insects (count) in different parts of 
different plants (plantid) and recorded as well some characteristics of the 
plant parts (e.g. thickness). It is an unbalanced design with 21 plants with 
approx 25 parts each.
Preference of the insects for a certain characteristic is usually unimodal.
As far as I understood, I have to use a model with random intercepts and 
slopes, because the observations within each plant are not independent.

So far so good

========(lme4)=========

glmm1<-GLMM(count~thick+I(thick^2),random=~thick+I(thick^2)
|plantid,poisson,data=Dataset,control=list(PQLmaxIt=10000))
> summary(glmm1)
Generalized Linear Mixed Model

Family: poisson family with log link
Fixed: lixt ~ thick + I(thick^2) 
Data: Dataset 
       AIC      BIC   logLik
 -125.2406 -83.7346 72.62031

Random effects:
 Groups  Name        Variance  Std.Dev. Corr          
 plantid (Intercept) 0.0173455 0.131702               
         thick       0.0389772 0.197426 -1.000        
         I(thick^2)  0.0013327 0.036507  1.000 -1.000 
# of obs: 469, groups: plantid, 21

Estimated scale (compare to 1)  1.402567 

Fixed effects:
             Estimate Std. Error z value  Pr(>|z|)    
(Intercept) -4.045569   0.346950 -11.660 < 2.2e-16 ***
thick        2.378207   0.195425  12.169 < 2.2e-16 ***
I(thick^2)  -0.280898   0.025458 -11.034 < 2.2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Correlation of Fixed Effects:
           (Intr) thick 
thick      -0.968       
I(thick^2)  0.900 -0.977

============================

Question 1: Is the formula suitable for my design?

Question 2: What can be the reason for the positive logLik-value?

Question 3: It is correct, that I do not have to take care about over-
dispersion.

Question 4: When I use "poly(thick,2)" instead of "thick+I(thick^2)" I get 
completly different estimate-values (the latter one are the correct one). I 
thought it should be the same.

============================
> glmm2<-GLMM(count~poly(thick,2),random=~poly(thick,2)
|plantid,poisson,data=Dataset,control=list(PQLmaxIt=10000))
> summary(glmm2)
[...]

Fixed effects:
                 Estimate Std. Error  z value  Pr(>|z|)    
(Intercept)      -0.69293    0.10711  -6.4694 9.837e-11 ***
poly(thick, 2)1 -47.23211    5.97336  -7.9071 2.634e-15 ***
poly(thick, 2)2 -51.21421    4.64972 -11.0145 < 2.2e-16 ***

============================

Question 5: If I use the same formula in glmmPQL, I get more or less similar 
results, but different values for AIC BIC and logLik.
I read in this thread:
http://maths.newcastle.edu.au/~rking/R/help/03b/6849.html
That it should be the same value due to the same algorithm  

Maybe as additional comment, I specified a "NULL-model"
> dummy<-rep(1,469)
> glmm3<-GLMM(count~1,random=~1|dummy,poisson)
resp
> Dataset$dummy<-1
> glmm3<-glmmPQL(count~1,random=~1|dummy,poisson)

and GLMM gave the correct value (logLik = (Null deviance/2) from glm),whereas 
glmmPQL calculated a logLik almost twice as high.

=========(MASS)=====
> glmm1<-glmmPQL(lixt~thick+I(thick^2),random=~thick+I(thick^2)
|plantid,poisson,Dataset)
iteration 1 
[...] 
iteration 10 
> summary(glmm1)

Linear mixed-effects model fit by maximum likelihood
 Data: Dataset 
       AIC      BIC    logLik
  1988.500 2030.006 -984.2502

Random effects:
 Formula: ~thick + I(thick^2) | plantid
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev     Corr         
(Intercept) 0.27914612 (Intr) thick 
thick       0.03089333 -0.251       
I(thick^2)  0.01867846 -0.878 -0.067
Residual    1.37952490              

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: lixt ~ thick + I(thick^2) 
                Value Std.Error  DF   t-value p-value
(Intercept) -4.038800 0.4834792 446 -8.353617       0
thick        2.371842 0.2642474 446  8.975837       0
I(thick^2)  -0.279189 0.0337451 446 -8.273479       0
 Correlation: 
           (Intr) thick 
thick      -0.970       
I(thick^2)  0.896 -0.973

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.2093444 -0.5972699 -0.3725736  0.2819543  6.1820947 

Number of Observations: 469
Number of Groups: 21 
=============================

Question 6: When I tried method="Laplace" an error occurred:
> g1<-GLMM(count~thick+I(thick^2),random=~thick+I(thick^2)
|plantid,poisson,data=Dataset,method="Laplace")
Using optimizer nlm 
Error: rank deficiency of ZtZ+W detected at column 11

Is that indicating multicollinearity?
Is there is a possibility to avoid this?

I got also from time to time the message

Omega[1] is not positive definite

Can I fix this somehow?

Question 7: Is there a possibility to calculate something like a pseudo-r-
square (e.g. 1-(logLik(Nullmodel)/loglike model)?)

Question 8: When I specify the whole model as fix as e.g:
glm1<-glm(count=(thick+I(thick^2))+(thick+I(thick^2))%in%
plantid,poisson,data=Dataset)
Is the model than wrong or just less powerful? I guess the latter is the case, 
but I would like to be sure. If in this version is just the power decreased, it 
would be very helpful for me to use for the more complex models this approach, 
because errors and warnings are becoming more frequent with more factors and 
covariates.  

I know that there are a lot of questions (I am sorry for that), and I don't 
expect that all will be commented. Most interesting for me are the questions 
1,6 and 8.
Thank you very much in advance.
Jan

========================================
Jan Hattendorf
University of Berne 
Zoological Institute
Baltzerstrasse 6 
CH-3012 Berne 
+41-31-631 4523
jan.hattendorf at zos.unibe.ch
http://www.cx.unibe.ch/zos/index.html