[R] Likelihood ratio test between glm and glmer fits

Thu Jul 17 13:29:41 CEST 2008

On Thu, Jul 17, 2008 at 2:50 AM, Rune Haubo <rhbc at imm.dtu.dk> wrote:
> 2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at med.kuleuven.be>:
>> well, for computing the p-value you need to use pchisq() and dchisq() (check
>> ?dchisq for more info). For model fits with a logLik method you can directly
>> use the following simple function:
>>
>> lrt <- function (obj1, obj2) {
>>    L0 <- logLik(obj1)
>>    L1 <- logLik(obj2)
>>    L01 <- as.vector(- 2 * (L0 - L1))
>>    df <- attr(L1, "df") - attr(L0, "df")
>>    list(L01 = L01, df = df,
>>        "p-value" = pchisq(L01, df, lower.tail = FALSE))
>> }
>>
>> library(lme4)
>> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
>>              family = binomial, data = cbpp)
>> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
>>              family = binomial, data = cbpp)
>>
>> lrt(gm0, gm1)
>
> Yes, that seems quite natural, but then try to compare with the deviance:
>
> logLik(gm0)
> logLik(gm1)
>
> (d0 <- deviance(gm0))
> (d1 <- deviance(gm1))
> (LR <- d0 - d1)
> pchisq(LR, 1, lower = FALSE)
>
> Obviously the deviance in glm is *not* twice the negative
> log-likelihood as it is in glmer. The question remains which of these
> two quantities is appropriate for comparison. I am not sure exactly
> how the deviance and/or log-likelihood are calculated in glmer, but my
> feeling is that one should trust the deviance rather than the
> log-likelihoods for these purposes. This is supported by the following
> comparison: Ad an arbitrary random effect with a close-to-zero
> variance and note the deviance:
>
> tmp <- rep(1:4, each = nrow(cbpp)/4)
> gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
>             family = binomial, data = cbpp)
> (d2 <- deviance(gm2))
>
> This deviance is very close to that obtained from the glm model.
>
> I have included the mixed-models mailing list in the hope that someone
> could explain how the deviance is computed in glmer and why deviances,
> but not likelihoods are comparable to glm-fits.

In that example I think the problem may be that I have not yet written
the code to adjust the deviance of the glmer fit for the null
deviance.

>> However, there are some issues regarding this likelihood ratio test.
>>
>> 1) The null hypothesis is on the boundary of the parameter space, i.e., you
>> test whether the variance for the random effect is zero. For this case the
>> assumed chi-squared distribution for the LRT may *not* be totally
>> appropriate and may produce conservative p-values. There is some theory
>> regarding this issue, which has shown that the reference distribution for
>> the LRT in this case is a mixture of a chi-squared(df = 0) and
>> chi-squared(df = 1). Another option is to use simulation-based approach
>> where you can approximate the reference distribution of the LRT under the
>> null using simulation. You may check below for an illustration of this
>> procedure (not-tested):
>>
>> X <- model.matrix(gm0)
>> coefs <- coef(gm0)
>> pr <- plogis(c(X %*% coefs))
>> n <- length(pr)
>> new.dat <- cbpp
>> Tobs <- lrt(gm0, gm1)$L01
>> B <- 200
>> out.T <- numeric(B)
>> for (b in 1:B) {
>>    y <- rbinom(n, cbpp$size, pr)
>>    new.dat$incidence <- y
>>    fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
>>    fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
>>    out.T[b] <- lrt(fit0, fit1)$L01
>> }
>> # estimate p-value
>> (sum(out.T >= Tobs) + 1) / (B + 1)
>>
>>
>> 2) For the glmer fit you have to note that you work with an approximation to
>> the log-likelihood (obtained using numerical integration) and not the actual
>> log-likelihood.
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>> ----
>> Dimitris Rizopoulos
>> Biostatistical Centre
>> School of Public Health
>> Catholic University of Leuven
>>
>> Address: Kapucijnenvoer 35, Leuven, Belgium
>> Tel: +32/(0)16/336899
>> Fax: +32/(0)16/337015
>> Web: http://med.kuleuven.be/biostat/
>>     http://www.student.kuleuven.be/~m0390867/dimitris.htm
>>
>>
>> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>>
>>> Dear list,
>>> I am fitting a logistic multi-level regression model and need to  test the
>>> difference between the ordinary logistic regression from a  glm() fit and
>>> the mixed effects fit from glmer(), basically I want  to do a likelihood
>>> ratio test between the two fits.
>>>
>>>
>>> The data are like this:
>>> My outcome is a (1,0) for health status, I have several (1,0) dummy
>>>  variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
>>>  divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is  continuous (20
>>> to 100).
>>> My higher level is called munid and has 581 levels.
>>> The data have 45243 observations.
>>>
>>> Here are my program statements:
>>>
>>> #GLM fit
>>>
>>> ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>>>  data=mx.merge)
>>> #GLMER fit
>>>
>>> ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>>>  data=mx.merge)
>>>
>>> I cannot find a method in R that will do the LR test between a glm  and a
>>> glmer fit, so I try to do it using the liklihoods from both  models
>>>
>>> #form the likelihood ratio test between the glm and glmer fits
>>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>>
>>>>    ML
>>>
>>> 79.60454
>>> attr(,"nobs")
>>>    n
>>> 45243
>>> attr(,"nall")
>>>    n
>>> 45243
>>> attr(,"df")
>>> [1] 14
>>> attr(,"REML")
>>> [1] FALSE
>>> attr(,"class")
>>> [1] "logLik"
>>>
>>> #Get the associated p-value
>>> dchisq(x2,14)
>>>         ML
>>>>
>>>> 5.94849e-15
>>>
>>> Which looks like an improvement in model fit to me.  Am I seeing  this
>>> correctly or are the two models even able to be compared? they  are both
>>> estimated via maximum likelihood, so they should be, I think.
>>> Any help would be appreciated.
>>>
>>> Corey
>>>
>>> Corey S. Sparks, Ph.D.
>>>
>>> Assistant Professor
>>> Department of Demography and Organization Studies
>>> University of Texas San Antonio
>>> One UTSA Circle
>>> San Antonio, TX 78249
>>> email:corey.sparks at utsa.edu
>>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Rune Haubo Bojesen Christensen
>
> Master Student, M.Sc. Eng.
> Phone: (+45) 30 26 45 54
> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>
> DTU Informatics, Section for Statistics
> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>