[R-sig-ME] Multilevel logistic regression
Brant Inman
brant.inman at me.com
Sun Mar 22 15:22:18 CET 2009
Ben,
Thanks for confirming my findings. Two questions for you:
1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?
f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
f3 <- lmer( success ~ covariates + (1 | group), data=long ...
2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?
Brant
----------------------------------------------------------------------------------------------------------------------------
On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
> I tried this out with a hacked version of lme4 that removes
>the test on the number of observations, and get reasonable answers
>for f3, to wit:
>
>Random effects:
> Groups Name Variance Std.Dev. Corr
> SOURCE (Intercept) 0.78845 0.88795
> TELDUM 0.24216 0.49210 -0.281
> MAILDUM 0.52215 0.72260 -0.365 0.317
>Number of obs: 105, groups: SOURCE, 48
>
>Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) 1.12762 0.19519 5.777 7.6e-09 ***
>RESPISRR 0.20714 0.21497 0.964 0.335258
>TELDUM -0.20251 0.09141 -2.215 0.026737 *
>MAILDUM -0.56094 0.14708 -3.814 0.000137 ***
>---
>
> The random effects (esp for telephone/mail) are a little different.
>It doesn't look like the software used in the chapter estimates
>correlations among random effects?
>
>## Table 6.4 Models for response rates in different conditions
>## Fixed part conditions fixed conditions random
>## coeff. (s.e.) coeff. (s.e.)
>## Predictor
>## intercept 0.90 (.14) 1.17 (.21)
>## resptype 0.53 (.06) 0.20 (.23)
>## telephone -0.16 (.02) -0.20 (.10)
>## mail -0.49 (.03) -0.58 (.16)
>## Random part
>## intercept1 1.00 1.00
>## intercept2 0.86 (.18) 0.87 (.20)
>## telephone 0.26 (.08)
>## mail 0.59 (.20)
>
>Eliminating the correlation among random effects:
>
>f3B <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
> family=binomial, data=wide)
>summary(f3B)
>
>Random effects:
> Groups Name Variance Std.Dev.
> SOURCE (Intercept) 0.72203 0.84972
> SOURCE TELDUM 0.23582 0.48562
> SOURCE MAILDUM 0.43923 0.66274
>Number of obs: 105, groups: SOURCE, 48
>
>
> telephone/mail random effects still estimated higher.
>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>
>Random effects:
> Groups Name Variance Std.Dev.
> SOURCE (Intercept) 0.74275 0.86183
> SOURCE TELDUM 0.24565 0.49563
> SOURCE MAILDUM 0.28567 0.53448
>Number of obs: 105, groups: SOURCE, 48
>
>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>
>
>
>
>Brant Inman wrote:
>> Thierry et al:
>>
>> I think I solved my problem and it provides some insight into the way
>> lmer handles binomial data, so I will share the findings here. First,
>> I have made two datasets available on the web, a long format and a
>> wide format version of the "metaresp" data of Joop Hox.. They can be
>> found here
>>
>> http://www.duke.edu/~bi6/
>>
>> Hox has a draft of the chapter of interest that discusses the
>> metaresp dataset and the modeling process/problem that I am trying to
>> solve. Note that the results that I am trying to reproduce are in
>> Tables 6.3 and 6.4. The chapter can be found at:
>>
>> http://www.geocities.com/joophox/papers/chap6.pdf
>>
>> Now here are the models that I fit with lmer. Assume that the wide
>> version of the data is called "wide" and the long version "long".
>> -----------------------------
>>
>> y <- cbind(wide$SUCCESS, wide$FAIL)
>>
>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide)
>> summary(f1)
>>
>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>> family=binomial, data=wide) summary(f2)
>>
>> f3 <- lmer(y ~ ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>> | SOURCE), family=binomial, data=wide) summary(f3)
>>
>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>>
>> -------------------------------
>>
>> Models f1, f2, and f4 work and reproduce the results of Hox. Model
>> f4 takes a hell of a long time to compute, but it seems to give the
>> expected results. Model f3, which I assumed (wrongly) would be the
>> same as f4, does not work. Instead, when it is run, you get the
>> error message:
>>
>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>
>> I guess the question that I have now is: did I do something wrong
>> with model f3 or is lmer doing something unusual? My assumption that
>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>> describes the glm function for logistic regression.
>>
>> I very much appreciate any insight.
>>
>> Brant
>>
>> #####################################################################################
>>
>>
>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>> <Thierry.ONKELINX at inbo.be> wrote:
>>> Dear Brant,
>>>
>>> The model is too complex. You have maximum three observations for
>>> each level of the random effect. Allowing for a random intercept
>>> and two random slopes does not make much sense then. Does it?
>>>
>>> HTH,
>>>
>>> Thierry
>>>
>>>
>>> ------------------------------------------------------------------------
>>> ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>> Research Institute for Nature and Forest Cel biometrie,
>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>
>>> To call in the statistician after the experiment is done may be no
>>> more than asking him to perform a post-mortem examination: he may
>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>> Fisher
>>>
>>> The plural of anecdote is not data. ~ Roger Brinner
>>>
>>> The combination of some data and an aching desire for an answer
>>> does not ensure that a reasonable answer can be extracted from a
>>> given body of data. ~ John Tukey
>>>
>>> -----Oorspronkelijk bericht----- Van:
>>> r-sig-mixed-models-bounces at r-project.org
>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>> logistic regression
>>>
>>>
>>> lmer Experts:
>>>
>>> I am trying to use lmer to duplicate the results found in Joop
>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>> In chapter 6 of his book he shows an example of multilevel logistic
>>> regression for a meta-analysis of survey response rates. The data
>>> are available in the file "metaresp.xls" at his website:
>>>
>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>>
>>> The dataset includes the following variables of interest:
>>>
>>> Individual level (Level 1) variables: TELDUM = telephone
>>> questioning MAILDUM = mail questioning RESPONSE = the outcome of
>>> interest, the study response rate DENOM = the number of people
>>> questioned
>>>
>>> Study/group level (Level 2) variables: SOURCE = the study
>>> identifier YEAR = year of study SALIENCY = how salient the
>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>> calculated
>>>
>>>
>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>>
>>> SUCCESS <- as.integer(RESPISRR*DENOM) y <- cbind(SUCCESS,
>>> DENOM-SUCCESS)
>>>
>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>> family=binomial(link=logit))
>>>
>>>
>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>>
>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE),
>>> family=binomial(link=logit))
>>>
>>>
>>> He then says that these two Level 1 variables should be allowed to
>>> vary across studies (varying slopes). When I try to fit what I
>>> believe to be the correct model, I get an error
>>>
>>>
>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) +
>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>>
>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>
>>>
>>> Can anyone tell me what I am doing wrong here? Thanks so much in
>>> advance.
>>>
>>> Brant Inman Duke University Medical Center
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>> The views expressed in this message and any annex are purely those
>>> of the writer and may not be regarded as stating an official
>>> position of INBO, as long as the message is not confirmed by a duly
>>> signed document.
>>>
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>--
>Ben Bolker
>Associate professor, Biology Dep't, Univ. of Florida
>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>
More information about the R-sig-mixed-models
mailing list