[R-sig-ME] glmmPQL Help: Random Effect and Dispersion Parameter

Sat Jun 25 18:53:41 CEST 2011

This time with the enclosure :-)

On Sat, Jun 25, 2011 at 11:52 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Fri, Jun 24, 2011 at 2:28 PM, Yue Yu <parn.yy at gmail.com> wrote:
>> Thanks a lot, Dennis.
>>
>> The reason I am not using lme4 is that its estimated variance
>> component seems not desirable.
>>
>> If you use my simulated data and run the following line
>>
>> glm2 <- glmer(y~1+(1|alpha)+(1|beta),family=gaussian(link="log"),data=simu)
>>
>> the result will be
>>
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: y ~ 1 + (1 | alpha) + (1 | beta)
>>   Data: simu
>>  AIC   BIC logLik deviance
>>  508 529.2 -250      500
>> Random effects:
>>  Groups   Name        Variance   Std.Dev.
>>  alpha    (Intercept) 0.01957449 0.139909
>>  beta     (Intercept) 0.00080326 0.028342
>>  Residual             0.06888192 0.262454
>> Number of obs: 1500, groups: alpha, 100; beta, 5
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept)  1.13506    0.01907   59.52
>>
>> The Std.Dev of alpha, beta and residual is far way from the true value
>> in my simulation. While glmmPQL will give a better result.
>
> Hmm, in lme4a the results, shown in the enclosed, are much closer to
> the values used for simulation.  However, this example does show a
> deficiency in this implementation of glmer in that the estimate of the
> residual standard deviation is not calculated and not shown here.
>
>> But I still need to find the variance matrix for variance components
>> and the dispersion parameter, any suggestions?
>
> The best suggestion is don't do it.  Estimates of variance components
> are not symmetrically distributed (think of the simplest case of the
> estimator of the variance from an i.i.d Gaussian sample, which has a
> chi-squared distribution).
>
>> On Fri, Jun 24, 2011 at 09:52, Dennis Murphy <djmuser at gmail.com> wrote:
>>> Hi:
>>>
>>> glmmPQL has been around a while, and I suspect it was not meant to
>>> handle crossed random effects. This was one of the original
>>> motivations for the lme4 package, and it seems to work there, although
>>> it's using Gauss-Hermite approximations to the likelihood rather than
>>> PQL:
>>>
>>> library(lme4)
>>> mod1 <- lmer(y ~ 1 + (1 | beta) + (1 | alpha), data = simu)
>>>> summary(mod1)
>>> Linear mixed model fit by REML ['summary.mer']
>>> Formula: y ~ 1 + (1 | beta) + (1 | alpha)
>>>   Data: simu
>>> REML criterion at convergence: 584.4204
>>>
>>> Random effects:
>>>  Groups   Name        Variance Std.Dev.
>>>  alpha    (Intercept) 3.11128  1.7639
>>>  beta     (Intercept) 0.17489  0.4182
>>>  Residual             0.05405  0.2325
>>> Number of obs: 1500, groups: alpha, 100; beta, 5
>>>
>>> Fixed effects:
>>>            Estimate Std. Error t value
>>> (Intercept)   2.9167     0.2572   11.34
>>>
>>> Hopefully that's closer to what you had in mind. If not, take a look
>>> at Ben Bolker's GLMM wiki:
>>>
>>> http://glmm.wikidot.com/faq
>>>
>>> BTW, thank you for the nice reproducible example.
>>>
>>> Dennis
>>>
>>>
>>> On Thu, Jun 23, 2011 at 9:16 PM, Yue Yu <parn.yy at gmail.com> wrote:
>>>> Dear R users,
>>>>
>>>> I am currently doing a project in generalized mixed model, and I find
>>>> the R function glmmPQL in MASS can do this via PQL. But I am a newbie
>>>> in R, the input and output of glmmPQL confused me, and I wish I can
>>>> get some answers here.
>>>>
>>>> The model I used is a typical two-way generalized mixed model with
>>>> random subject (row) and block (column) effects and log link function,
>>>> y_{ij} = exp(\mu+\alpha_i+\beta_j)+\epsilon.
>>>>
>>>> I can generate a pseudo data by the following R code
>>>>
>>>> ===================================================
>>>> k <- 5; # Number of Blocks (Columns)
>>>> n <- 100; # Number of Subjects (Rows)
>>>> m <- 3; # Number of Replications in Each Cell
>>>>
>>>> sigma.a <- 0.5; # Var of Subjects Effects
>>>> sigma.b <- 0.1; # Var of Block Effects
>>>> sigma.e <- 0.01; # Var of Errors
>>>> mu <- 1; # Overall mean
>>>>
>>>> a <- rep(rnorm(n,0,sigma.a),each=k*m);
>>>> b <- rep(rep(rnorm(k,0,sigma.b),each=m),n);
>>>>
>>>> # Simulate responses y_{ij}=exp(\mu+\alpha_i+\beta_j)+e
>>>> y <- exp(mu+a+b)+rnorm(1,0,sigma.e);
>>>>
>>>> # Indicator vector of subject effects alpha
>>>> alpha <- rep(seq(1,n),each=k*m);
>>>>
>>>> # Indicator vector of block effects beta
>>>> beta <- rep(rep(seq(1:k),each=m),n);
>>>>
>>>> simu <- data.frame(y,beta,alpha)
>>>> ===================================================
>>>>
>>>> And I want to use glmmPQL to estimate the mean and variance
>>>> components, but I have several questions.
>>>>
>>>> 1. How to write the random effect formula?
>>>> I have tried
>>>> glm <- glmmPQL(y~1,random=~alpha+beta,family=gaussian(link="log"),data=simu);
>>>> but it did not work and got the error message "Invalid formula for groups".
>>>>
>>>> And the command
>>>> glm <- glmmPQL(y~1,random=~1|alpha/beta,family=gaussian(link="log"),data=simu)
>>>> worked, but the result was the nested "beta %in% alpha" variances,
>>>> which was not what I want.
>>>>
>>>> 2. How to find the estimated variance-covariance matrix for the
>>>> variance components, which should be the inverse of information
>>>> matrix.
>>>> I notice the output variable glm at apVar will give a similar
>>>> variance-covariance matrix, but it has the prefix "reStruct." and
>>>> attribute "Pars", what are these stand for? I can't find any
>>>> explanation in the help document.
>>>>
>>>> 3. I am also wondering if there is a way to calculate the dispersion
>>>> parameter or not?
>>>>
>>>> Anyone has some tips? Any suggestions will be greatly appreciated.
>>>>
>>>> Best,
>>>>
>>>> Yue Yu
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
-------------- next part --------------

R version 2.14.0 Under development (unstable) (2011-06-24 r56210)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> k <- 5; # Number of Blocks (Columns)
> n <- 100; # Number of Subjects (Rows)
> m <- 3; # Number of Replications in Each Cell
> 
> sigma.a <- 0.5; # Var of Subjects Effects
> sigma.b <- 0.1; # Var of Block Effects
> sigma.e <- 0.01; # Var of Errors
> mu <- 1; # Overall mean
> 
> set.seed(12345)
> a <- rep(rnorm(n,0,sigma.a),each=k*m);
> b <- rep(rep(rnorm(k,0,sigma.b),each=m),n);
> 
> # Simulate responses y_{ij}=exp(\mu+\alpha_i+\beta_j)+e
> y <- exp(mu+a+b)+rnorm(1,0,sigma.e);
> 
> # Indicator vector of subject effects alpha
> alpha <- rep(seq(1,n),each=k*m);
> 
> # Indicator vector of block effects beta
> beta <- rep(rep(seq(1:k),each=m),n);
> 
> simu <- data.frame(y,beta,alpha)
> 
> library(lme4a)
Loading required package: Matrix
Loading required package: lattice

Attaching package: ?Matrix?

The following object(s) are masked from ?package:base?:

    det

Loading required package: minqa
Loading required package: Rcpp
Loading required package: MatrixModels

Attaching package: ?MatrixModels?

The following object(s) are masked from ?package:stats?:

    getCall

> summary(glm2 <- glmer(y ~ 1 + (1|alpha) + (1|beta), simu, gaussian(link="log"), verbose=2L))
npt = 4 , n =  2 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   7:      512.232;0.600458 0.986477 
  0.0020:  14:      509.499;0.554646 0.887092 
 0.00020:  51:      490.819;0.536841 0.0779844 
 2.0e-05:  60:      490.815;0.539243 0.0788170 
 2.0e-06:  62:      490.815;0.539294 0.0787849 
 2.0e-07:  66:      490.815;0.539295 0.0787820 
At return
 72:     490.81485: 0.539295 0.0787822
npt = 5 , n =  3 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   6:      495.473;0.539295 0.200000  1.10770 
  0.0020:  14:      491.648;0.546593 0.109993  1.11128 
 0.00020:  24:      490.787;0.539576 0.0785051  1.09634 
 2.0e-05:  28:      490.787;0.539552 0.0788343  1.09663 
 2.0e-06:  37:      490.787;0.539619 0.0787000  1.09667 
 2.0e-07:  53:      490.787;0.539613 0.0787062  1.09676 
At return
 66:     490.78710: 0.539613 0.0787075  1.09676
Generalized linear mixed model fit by maximum likelihood ['summary.mer']
 Family: gaussian 
Formula: y ~ 1 + (1 | alpha) + (1 | beta) 
   Data: simu 
      AIC       BIC    logLik  deviance 
 496.7871  512.7268 -245.3935  490.7871 

Random effects:
 Groups Name        Variance Std.Dev.
 alpha  (Intercept) 0.291182 0.53961 
 beta   (Intercept) 0.006195 0.07871 
Number of obs: 1500, groups: alpha, 100; beta, 5

Fixed effects:
            Estimate Std. Error z value
(Intercept)  1.09676    0.06532   16.79
> 
> proc.time()
   user  system elapsed 
 11.780   0.270  12.248