[R] GLMM(..., family=binomial(link="cloglog"))?

Wed Jun 9 16:40:24 CEST 2004

Hi, Go"ran:

Thanks for the analysis. Unfortunately, it still leaves me with 2
problems. First, I'm dealing with extremely small defect rates involving
thousands and millions of Bernoulli trials, so creating bigDF would
require computers with much more memory and processing speed available.

Second, there still seems to be substantial bias in the estimates. Do
you have any thoughts about the origins of this bias and what I can do
about it? Maybe I should try your bigDF GLMM call with method="Laplace".
Since Lindsey's "glmm" claims to be doing Gauss-Hermite quadrature, I
wonder if adaptive Gauss-Hermite will fix this bias problem. I guess
I'll just have to try it and find out.

Thanks again.
Spencer Graves

Go"ran Brostro"m wrote:

>On Tue, Jun 08, 2004 at 08:32:24AM -0700, Spencer Graves wrote:
>  
>
>>Hi, Doug: 
>>
>>     Thanks.  I'll try the things you suggests.  The observed 
>>proportions ranged from roughly 0.2 to 0.8 in 100 binomial random 
>>samples where sigma is at most 0.05.  Jim Lindsey's "glmm" does 
>>Gauss-Hermite quadrature, but I don't know if it bothers with the 
>>adaptive step.  With it, I've seen estimates of the variance component 
>>ranging from 0.4 to 0.7 or so.  Since I simulated normal 0 standard 
>>deviation of 1, the algorithm was clearly underestimating what was 
>>simulated.  My next step, I think, is to program adaptive Gauss-Hermite 
>>quadrature for something closer to my real problem (as you just 
>>suggested), and see what I get. 
>>    
>>
>[...]
>
>  
>
>>Douglas Bates wrote:
>>
>>    
>>
>>>Spencer Graves <spencer.graves at pdf.com> writes:
>>>
>>>      
>>>
>[...]
>  
>
>>>>	  Consider the following:
>>>>
>>>>        
>>>>
>>>>>set.seed(1); N <- 10
>>>>>z <- rnorm(N)
>>>>>pz <- inv.logit(z)
>>>>>DF <- data.frame(z=z, pz=pz, y=rbinom(N, 100, pz)/100, n=100,
>>>>>smpl=factor(1:N))
>>>>>          
>>>>>
>>>>>GLMM(y~1, family=binomial, data=DF, random=~1|smpl, weights=n)
>>>>>          
>>>>>
>
>I think the "weights=n" is the problem, i.e., you summarize Bernoulli
>data to Bin(100, p) data, and that gives a completely different estimate of
>the variance of the random effect. (This looks as an error in lme4 to me,
>or am I missing something? Doug?) Really, the two ways of representing data
>should give equivalent analyses, but it doesn't. The same phenomenon
>appears in 'glm', i.e. without random effects, but only for the residual
>sum of squares, df, and AIC. 
>
>The following is a small function calling lme4 twice, first as
>before, and then with data 'wrapped up' into Bernoulli data. I also run
>'glmmML' in the latter case (since glmmML only allows the Bernoulli
>representation):
>
>-------------------------------------------------------------------   
>sim <- function(N = 10, grpSize = 10, std = 1){
>    require(glmmML)
>    require(lme4)
>    
>    set.seed(1)
>    z <- rnorm(N, mean = 0, sd = std)
>#    pz <- inv.logit(z); is identical to (in 'stats')
>    pz <- plogis(z)
>    Y <- rbinom(N, grpSize, pz)
>
>    ## 'Summary' data frame:
>    
>    DF <- data.frame(z = z, pz = pz,
>                     y = Y / grpSize,
>                     n = grpSize,
>                     smpl = factor(1:N))
>
>    fit1 <- GLMM(y~1, family = binomial, data = DF,
>                 random = ~1|smpl, weights = n)
>    ##fit1 <- glm(y~1, family = binomial, data = DF, weights = n)
>
>
>    ## 'Individual' data frame:
>
>    n <- N * grpSize
>    z <- rep(z, rep(grpSize, N))
>    pz <- rep(pz, rep(grpSize, N))
>
>    y <- numeric(n)
>    for (i in 1:N) y[((i - 1) * grpSize + 1):((i-1)*grpSize + Y[i])] <- 1 
>    smpl <- as.factor(rep(1:N, rep(grpSize, N)))
>    
>    bigDF <- data.frame(z = z, pz = pz, y = y,
>                        smpl = smpl)
>    fit2 <- GLMM(y~1, family=binomial, data=bigDF, random=~1|smpl)
>    ##fit2 <- glm(y~1, family=binomial, data=bigDF)
>
>    fitML <- glmmML(y ~ 1, family = binomial,
>                    data = bigDF,
>                    cluster = bigDF$smpl)
>    
>    return(list(fit1 = fit1,
>                fit2 = fit2,
>                fitML = fitML
>                )
>           )
>}
>----------------------------------------------------------------
>
>Output:
>
>$fit1
>Generalized Linear Mixed Model
>
>Family: binomial family with logit link
>Fixed: y ~ 1 
>Data: DF 
> log-likelihood:  -11.59919 
>Random effects:
> Groups Name        Variance   Std.Dev.  
> smpl   (Intercept) 3.0384e-10 1.7431e-05
>
>Estimated scale (compare to 1)  1.391217 
>
>Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
>(Intercept)  0.48955    0.20602  2.3762  0.01749
>
>Number of Observations: 10
>Number of Groups: 10 
>
>$fit2
>Generalized Linear Mixed Model
>
>Family: binomial family with logit link
>Fixed: y ~ 1 
>Data: bigDF 
> log-likelihood:  -64.8084 
>Random effects:
> Groups Name        Variance Std.Dev.
> smpl   (Intercept) 0.58353  0.7639  
>
>Estimated scale (compare to 1)  0.9563351 
>
>Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
>(Intercept)  0.52811    0.32286  1.6357   0.1019
>
>Number of Observations: 100
>Number of Groups: 10 
>
>$fitML
>
>Call:  glmmML(formula = y ~ 1, data = bigDF, 
>              cluster = bigDF$smpl, family = binomial) 
>
>
>              coef se(coef)     z Pr(>|z|)
>(Intercept) 0.5587   0.3313 1.686   0.0917
>
>Standard deviation in mixing distribution:  0.764 
>Std. Error:                                 0.3655 
>
>Residual deviance: 129.5 on 98 degrees of freedom       AIC: 133.5 
>---------------
>Note the big difference in estimated random effect 'sd' in the two lme4
>runs! Note further how close to each other the corresponding estimates for
>the second run of lme4 and of glmmML are.
>
>[...]
>
>Go"ran
>  
>