[R-sig-ME] Multilevel logistic regression

Brant Inman brant.inman at me.com
Sun Mar 22 15:22:18 CET 2009


Ben,

Thanks for confirming my findings. Two questions for you:

1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?

f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
f3 <- lmer( success ~ covariates + (1 | group), data=long ...

2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?


Brant


----------------------------------------------------------------------------------------------------------------------------
 
On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
>   I tried this out with a hacked version of lme4 that removes
>the test on the number of observations, and get reasonable answers
>for f3, to wit:
>
>Random effects:
> Groups Name        Variance Std.Dev. Corr
> SOURCE (Intercept) 0.78845  0.88795
>        TELDUM      0.24216  0.49210  -0.281
>        MAILDUM     0.52215  0.72260  -0.365  0.317
>Number of obs: 105, groups: SOURCE, 48
>
>Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
>(Intercept)  1.12762    0.19519   5.777  7.6e-09 ***
>RESPISRR     0.20714    0.21497   0.964 0.335258
>TELDUM      -0.20251    0.09141  -2.215 0.026737 *
>MAILDUM     -0.56094    0.14708  -3.814 0.000137 ***
>---
>
>  The random effects (esp for telephone/mail) are a little different.
>It doesn't look like the software used in the chapter estimates
>correlations among random effects?
>
>## Table 6.4 Models for response rates in different conditions
>## Fixed part           conditions fixed       conditions random
>##                      coeff. (s.e.)          coeff. (s.e.)
>## Predictor
>## intercept            0.90 (.14)             1.17 (.21)
>## resptype             0.53 (.06)             0.20 (.23)
>## telephone            -0.16 (.02)            -0.20 (.10)
>## mail                 -0.49 (.03)            -0.58 (.16)
>## Random part
>## intercept1           1.00                   1.00
>## intercept2           0.86 (.18)             0.87 (.20)
>## telephone                                   0.26 (.08)
>## mail                                        0.59 (.20)
>
>Eliminating the correlation among random effects:
>
>f3B <- lmer(y  ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
>                family=binomial, data=wide)
>summary(f3B)
>
>Random effects:
> Groups Name        Variance Std.Dev.
> SOURCE (Intercept) 0.72203  0.84972
> SOURCE TELDUM      0.23582  0.48562
> SOURCE MAILDUM     0.43923  0.66274
>Number of obs: 105, groups: SOURCE, 48
>
>
>  telephone/mail random effects still estimated higher.
>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>
>Random effects:
> Groups Name        Variance Std.Dev.
> SOURCE (Intercept) 0.74275  0.86183
> SOURCE TELDUM      0.24565  0.49563
> SOURCE MAILDUM     0.28567  0.53448
>Number of obs: 105, groups: SOURCE, 48
>
>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>
>
>
>
>Brant Inman wrote:
>> Thierry et al:
>> 
>> I think I solved my problem and it provides some insight into the way
>> lmer handles binomial data, so I will share the findings here. First,
>> I have made two datasets available on the web, a long format and a
>> wide format version of the "metaresp" data of Joop Hox..  They can be
>> found here
>> 
>> http://www.duke.edu/~bi6/
>> 
>> Hox has a draft of the chapter of interest that discusses the
>> metaresp dataset and the modeling process/problem that I am trying to
>> solve.  Note that the results that I am trying to reproduce are in
>> Tables 6.3 and 6.4.  The chapter can be found at:
>> 
>> http://www.geocities.com/joophox/papers/chap6.pdf
>> 
>> Now here are the models that I fit with lmer.  Assume that the wide
>> version of the data is called "wide" and the long version "long". 
>> -----------------------------
>> 
>> y <- cbind(wide$SUCCESS, wide$FAIL)
>> 
>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide) 
>> summary(f1)
>> 
>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>> family=binomial, data=wide) summary(f2)
>> 
>> f3 <- lmer(y ~  ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>> | SOURCE), family=binomial, data=wide) summary(f3)
>> 
>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>> 
>> -------------------------------
>> 
>> Models f1, f2, and f4 work and reproduce the results of Hox.  Model
>> f4 takes a hell of a long time to compute, but it seems to give the
>> expected results.  Model f3, which I assumed (wrongly) would be the
>> same as f4, does not work.  Instead, when it is run, you get the
>> error message:
>> 
>>> Error in mer_finalize(ans) : q = 240 > n = 105
>> 
>> I guess the question that I have now is: did I do something wrong
>> with model f3 or is lmer doing something unusual?  My assumption that
>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>> describes the glm function for logistic regression.
>> 
>> I very much appreciate any insight.
>> 
>> Brant
>> 
>> #####################################################################################
>> 
>> 
>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>> <Thierry.ONKELINX at inbo.be> wrote:
>>> Dear Brant,
>>> 
>>> The model is too complex. You have maximum three observations for
>>> each level of the random effect. Allowing for a random intercept
>>> and two random slopes does not make much sense then. Does it?
>>> 
>>> HTH,
>>> 
>>> Thierry
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>>  ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>> Research Institute for Nature and Forest Cel biometrie,
>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium 
>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>> 
>>> To call in the statistician after the experiment is done may be no
>>> more than asking him to perform a post-mortem examination: he may
>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>> Fisher
>>> 
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> 
>>> The combination of some data and an aching desire for an answer
>>> does not ensure that a reasonable answer can be extracted from a
>>> given body of data. ~ John Tukey
>>> 
>>> -----Oorspronkelijk bericht----- Van:
>>> r-sig-mixed-models-bounces at r-project.org 
>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>> logistic regression
>>> 
>>> 
>>> lmer Experts:
>>> 
>>> I am trying to use lmer to duplicate the results found in Joop
>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>> In chapter 6 of his book he shows an example of multilevel logistic
>>>  regression for a meta-analysis of survey response rates.  The data
>>> are available in the file "metaresp.xls" at his website:
>>> 
>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>> 
>>> The dataset includes the following variables of interest:
>>> 
>>> Individual level (Level 1) variables: TELDUM	 = telephone
>>> questioning MAILDUM  = mail questioning RESPONSE = the outcome of
>>> interest, the study response rate DENOM    = the number of people
>>> questioned
>>> 
>>> Study/group level (Level 2) variables: SOURCE 	 = the study
>>> identifier YEAR	 = year of study SALIENCY = how salient the
>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>> calculated
>>> 
>>> 
>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>> 
>>> SUCCESS <- as.integer(RESPISRR*DENOM) y  	<- cbind(SUCCESS,
>>> DENOM-SUCCESS)
>>> 
>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>> family=binomial(link=logit))
>>> 
>>> 
>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>> 
>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE), 
>>> family=binomial(link=logit))
>>> 
>>> 
>>> He then says that these two Level 1 variables should be allowed to
>>>  vary across studies (varying slopes).  When I try to fit what I 
>>> believe to be the correct model, I get an error
>>> 
>>> 
>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) + 
>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>> 
>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>> 
>>> 
>>> Can anyone tell me what I am doing wrong here?  Thanks so much in
>>>  advance.
>>> 
>>> Brant Inman Duke University Medical Center
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________ 
>>> R-sig-mixed-models at r-project.org mailing list 
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>> The views expressed in  this message and any annex are purely those
>>> of the writer and may not be regarded as stating an official
>>> position of INBO, as long as the message is not confirmed by a duly
>>>  signed document.
>>> 
>>> 
>> 
>> _______________________________________________ 
>> R-sig-mixed-models at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>-- 
>Ben Bolker
>Associate professor, Biology Dep't, Univ. of Florida
>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>




More information about the R-sig-mixed-models mailing list