[R] Some questions to GLMM

Tue Nov 9 16:04:28 CET 2004

Jan Hattendorf wrote:
> Hello all R-user
> 
> I am relative new to the R-environment and also to GLMM, so please don't be 
> irritated if some questions don't make sense.
> I am using R 2.0.0 on Windows 2000.
> 
> I investigated the occurrence of insects (count) in different parts of 
> different plants (plantid) and recorded as well some characteristics of the 
> plant parts (e.g. thickness). It is an unbalanced design with 21 plants with 
> approx 25 parts each.
> Preference of the insects for a certain characteristic is usually unimodal.
> As far as I understood, I have to use a model with random intercepts and 
> slopes, because the observations within each plant are not independent.
> 
> So far so good
> 
> ========(lme4)=========
> 
> glmm1<-GLMM(count~thick+I(thick^2),random=~thick+I(thick^2)
> |plantid,poisson,data=Dataset,control=list(PQLmaxIt=10000))
> 
>>summary(glmm1)
> 
> Generalized Linear Mixed Model
> 
> Family: poisson family with log link
> Fixed: lixt ~ thick + I(thick^2) 
> Data: Dataset 
>        AIC      BIC   logLik
>  -125.2406 -83.7346 72.62031
> 
> Random effects:
>  Groups  Name        Variance  Std.Dev. Corr          
>  plantid (Intercept) 0.0173455 0.131702               
>          thick       0.0389772 0.197426 -1.000        
>          I(thick^2)  0.0013327 0.036507  1.000 -1.000 
> # of obs: 469, groups: plantid, 21
> 
> Estimated scale (compare to 1)  1.402567 
> 
> Fixed effects:
>              Estimate Std. Error z value  Pr(>|z|)    
> (Intercept) -4.045569   0.346950 -11.660 < 2.2e-16 ***
> thick        2.378207   0.195425  12.169 < 2.2e-16 ***
> I(thick^2)  -0.280898   0.025458 -11.034 < 2.2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> 
> Correlation of Fixed Effects:
>            (Intr) thick 
> thick      -0.968       
> I(thick^2)  0.900 -0.977
> 
> ============================
> 
> Question 1: Is the formula suitable for my design?

My guess is that your model is overparameterized.  Notice that the 
estimated correlations of the random effects are all near the extremes 
of -1 or +1.  I would start with a model that had fewer random effects 
terms.

> Question 2: What can be the reason for the positive logLik-value?

It is a common misconception that a log-likelihood must be negative.  In 
fact, a log-likelihood can be positive when it is based on a probability 
density.  Probabilities cannot exceed one but probability densities can. 
  In this case the log-likelihood is a combination of the probability 
density for the random effects and the conditional probability of the 
observations.

> Question 3: It is correct, that I do not have to take care about over-
> dispersion.

There is an indication of overdispersion but I would not worry about 
that until I could get a handle on the random effects terms.

> Question 4: When I use "poly(thick,2)" instead of "thick+I(thick^2)" I get 
> completly different estimate-values (the latter one are the correct one). I 
> thought it should be the same.

They fit the same model but with a different set of coefficients hence 
the estimated values are different.

> ============================
> 
>>glmm2<-GLMM(count~poly(thick,2),random=~poly(thick,2)
> 
> |plantid,poisson,data=Dataset,control=list(PQLmaxIt=10000))
> 
>>summary(glmm2)
> 
> [...]
> 
> Fixed effects:
>                  Estimate Std. Error  z value  Pr(>|z|)    
> (Intercept)      -0.69293    0.10711  -6.4694 9.837e-11 ***
> poly(thick, 2)1 -47.23211    5.97336  -7.9071 2.634e-15 ***
> poly(thick, 2)2 -51.21421    4.64972 -11.0145 < 2.2e-16 ***
> 
> ============================
> 
> Question 5: If I use the same formula in glmmPQL, I get more or less similar 
> results, but different values for AIC BIC and logLik.
> I read in this thread:
> http://maths.newcastle.edu.au/~rking/R/help/03b/6849.html
> That it should be the same value due to the same algorithm  

The GLMM function evaluates the likelihood at the PQL estimates using 
the Laplacian approximation so the result will be different from that 
returned by glmmPQL.

> Maybe as additional comment, I specified a "NULL-model"
> 
>>dummy<-rep(1,469)
>>glmm3<-GLMM(count~1,random=~1|dummy,poisson)
> 
> resp
> 
>>Dataset$dummy<-1
>>glmm3<-glmmPQL(count~1,random=~1|dummy,poisson)
> 
> 
> and GLMM gave the correct value (logLik = (Null deviance/2) from glm),whereas 
> glmmPQL calculated a logLik almost twice as high.
> 
> 
> =========(MASS)=====
> 
>>glmm1<-glmmPQL(lixt~thick+I(thick^2),random=~thick+I(thick^2)
> 
> |plantid,poisson,Dataset)
> iteration 1 
> [...] 
> iteration 10 
> 
>>summary(glmm1)
> 
> 
> Linear mixed-effects model fit by maximum likelihood
>  Data: Dataset 
>        AIC      BIC    logLik
>   1988.500 2030.006 -984.2502
> 
> Random effects:
>  Formula: ~thick + I(thick^2) | plantid
>  Structure: General positive-definite, Log-Cholesky parametrization
>             StdDev     Corr         
> (Intercept) 0.27914612 (Intr) thick 
> thick       0.03089333 -0.251       
> I(thick^2)  0.01867846 -0.878 -0.067
> Residual    1.37952490              
> 
> Variance function:
>  Structure: fixed weights
>  Formula: ~invwt 
> Fixed effects: lixt ~ thick + I(thick^2) 
>                 Value Std.Error  DF   t-value p-value
> (Intercept) -4.038800 0.4834792 446 -8.353617       0
> thick        2.371842 0.2642474 446  8.975837       0
> I(thick^2)  -0.279189 0.0337451 446 -8.273479       0
>  Correlation: 
>            (Intr) thick 
> thick      -0.970       
> I(thick^2)  0.896 -0.973
> 
> Standardized Within-Group Residuals:
>        Min         Q1        Med         Q3        Max 
> -1.2093444 -0.5972699 -0.3725736  0.2819543  6.1820947 
> 
> Number of Observations: 469
> Number of Groups: 21 
> =============================
> 
> Question 6: When I tried method="Laplace" an error occurred:
> 
>>g1<-GLMM(count~thick+I(thick^2),random=~thick+I(thick^2)
> 
> |plantid,poisson,data=Dataset,method="Laplace")
> Using optimizer nlm 
> Error: rank deficiency of ZtZ+W detected at column 11

> Is that indicating multicollinearity?

No, it is an indication that the variance-covariance of the random 
effects is not positive definite.

> Is there is a possibility to avoid this?

You will need to reduce the number of random effects terms.

> I got also from time to time the message
> 
> Omega[1] is not positive definite
> 
> Can I fix this somehow?

Same as above.

> Question 7: Is there a possibility to calculate something like a pseudo-r-
> square (e.g. 1-(logLik(Nullmodel)/loglike model)?)
> 
> Question 8: When I specify the whole model as fix as e.g:
> glm1<-glm(count=(thick+I(thick^2))+(thick+I(thick^2))%in%
> plantid,poisson,data=Dataset)
> Is the model than wrong or just less powerful? I guess the latter is the case, 
> but I would like to be sure. If in this version is just the power decreased, it 
> would be very helpful for me to use for the more complex models this approach, 
> because errors and warnings are becoming more frequent with more factors and 
> covariates.  
>  
> I know that there are a lot of questions (I am sorry for that), and I don't 
> expect that all will be commented. Most interesting for me are the questions 
> 1,6 and 8.
> Thank you very much in advance.
> Jan
> 
> 
> ========================================
> Jan Hattendorf
> University of Berne 
> Zoological Institute
> Baltzerstrasse 6 
> CH-3012 Berne 
> +41-31-631 4523
> jan.hattendorf at zos.unibe.ch
> http://www.cx.unibe.ch/zos/index.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html