[R-sig-ME] Multilevel logistic regression

Mon Mar 23 16:05:18 CET 2009

Thanks for everyone's explanations.  I truly appreciate it.  Since I have never really delved into S4 objects like lmer, I now have a reason now to crack open Chambers' green book and figure out how to track lmer down to its depths to modify the error message line.

This thread is why I think R is so great: as long as your question is not too terribly dumb you get advice from the experts!

Cheers,

Brant

-----------------------------------------

On Sunday, March 22, 2009, at 12:41PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>On Sun, Mar 22, 2009 at 9:22 AM, Brant Inman <brant.inman at me.com> wrote:
>> Ben,
>
>> Thanks for confirming my findings. Two questions for you:
>
>> 1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?
>
>> f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
>> f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
>> f3 <- lmer( success ~ covariates + (1 | group), data=long ...
>
>The code in lmer to count the "number of observations" is too
>simplistic.  It uses the length of the vector of responses which is
>appropriate except for the case of models like f1.  What should be
>done in that case is the evaluate the sum of the number of cases
>represented by each row of the data.
>
>Tracking such things down is not terribly easy.  The "initialize"
>expression from a glm family is "not unlike" a gross hack. It is an
>expression (not a function) that is evaluated for the sole purpose of
>scattering a bunch of objects around the function evaluation
>environment, which often ends up looking like a teenager's bedroom
>afterwards.
>
>Yes, special purpose code could be written to detect a situation like
>model f1 and account for it appropriately.  Doing so will mean keeping
>track of two, possibly different, definitions of the "number of
>observations".  If someone is willing to produce a patch for the code
>I will consider it.
>
>I don't think it will be as easy to detect that model f2 represents
>more observations than are recorded in the data.  All you know is that
>you are given observation weights, which could be the result of
>effects other than multiple "observations" per observation.
>
>> 2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?
>
>Well, you start by grepping the source code to determine where the
>error message is produced then you disable that part of the code.
>
>The error message is there for a reason - primarily related to linear
>mixed models.  Sometimes people get carried away and define random
>effects with respect to a factor that has a distinct level for each
>observation.  You can estimate the parameters in such a model but not
>uniquely.  The variance due to such a "one level per observation"
>factor is completely confounded with the variance of the
>"per-observation" noise term.  People have reported results from such
>models fit by lme so I wanted to protect against that by flagging such
>models in lme4.  One could adopt a "caveat emptor" approach and say
>that the user is responsible for ensuring that they model that they
>fit is mathematically reasonable so it they get nonsense results for a
>nonsense model it's their fault.  Unfortunately, that ends up
>reflecting badly on lme4 or the R project.  People fail to distinguish
>between a package in R and R in general so this type of problem is
>reported as a flaw in R, often by people who want to convince
>companies to pony up large amounts of money for commercial software
>licenses.
>
>Hence, I have opted for the more conservative approach of throwing an
>error in such cases.   As Ben has pointed out, lme4 is open source so
>if you want to modify it to fit a particular type of model that
>currently produces an error, you have the opportunity of doing so.
>
>
>> ----------------------------------------------------------------------------------------------------------------------------
>>
>> On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
>>>   I tried this out with a hacked version of lme4 that removes
>>>the test on the number of observations, and get reasonable answers
>>>for f3, to wit:
>>>
>>>Random effects:
>>> Groups Name        Variance Std.Dev. Corr
>>> SOURCE (Intercept) 0.78845  0.88795
>>>        TELDUM      0.24216  0.49210  -0.281
>>>        MAILDUM     0.52215  0.72260  -0.365  0.317
>>>Number of obs: 105, groups: SOURCE, 48
>>>
>>>Fixed effects:
>>>            Estimate Std. Error z value Pr(>|z|)
>>>(Intercept)  1.12762    0.19519   5.777  7.6e-09 ***
>>>RESPISRR     0.20714    0.21497   0.964 0.335258
>>>TELDUM      -0.20251    0.09141  -2.215 0.026737 *
>>>MAILDUM     -0.56094    0.14708  -3.814 0.000137 ***
>>>---
>>>
>>>  The random effects (esp for telephone/mail) are a little different.
>>>It doesn't look like the software used in the chapter estimates
>>>correlations among random effects?
>>>
>>>## Table 6.4 Models for response rates in different conditions
>>>## Fixed part           conditions fixed       conditions random
>>>##                      coeff. (s.e.)          coeff. (s.e.)
>>>## Predictor
>>>## intercept            0.90 (.14)             1.17 (.21)
>>>## resptype             0.53 (.06)             0.20 (.23)
>>>## telephone            -0.16 (.02)            -0.20 (.10)
>>>## mail                 -0.49 (.03)            -0.58 (.16)
>>>## Random part
>>>## intercept1           1.00                   1.00
>>>## intercept2           0.86 (.18)             0.87 (.20)
>>>## telephone                                   0.26 (.08)
>>>## mail                                        0.59 (.20)
>>>
>>>Eliminating the correlation among random effects:
>>>
>>>f3B <- lmer(y  ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>>>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
>>>                family=binomial, data=wide)
>>>summary(f3B)
>>>
>>>Random effects:
>>> Groups Name        Variance Std.Dev.
>>> SOURCE (Intercept) 0.72203  0.84972
>>> SOURCE TELDUM      0.23582  0.48562
>>> SOURCE MAILDUM     0.43923  0.66274
>>>Number of obs: 105, groups: SOURCE, 48
>>>
>>>
>>>  telephone/mail random effects still estimated higher.
>>>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>>>
>>>Random effects:
>>> Groups Name        Variance Std.Dev.
>>> SOURCE (Intercept) 0.74275  0.86183
>>> SOURCE TELDUM      0.24565  0.49563
>>> SOURCE MAILDUM     0.28567  0.53448
>>>Number of obs: 105, groups: SOURCE, 48
>>>
>>>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>>>
>>>
>>>
>>>
>>>Brant Inman wrote:
>>>> Thierry et al:
>>>>
>>>> I think I solved my problem and it provides some insight into the way
>>>> lmer handles binomial data, so I will share the findings here. First,
>>>> I have made two datasets available on the web, a long format and a
>>>> wide format version of the "metaresp" data of Joop Hox..  They can be
>>>> found here
>>>>
>>>> http://www.duke.edu/~bi6/
>>>>
>>>> Hox has a draft of the chapter of interest that discusses the
>>>> metaresp dataset and the modeling process/problem that I am trying to
>>>> solve.  Note that the results that I am trying to reproduce are in
>>>> Tables 6.3 and 6.4.  The chapter can be found at:
>>>>
>>>> http://www.geocities.com/joophox/papers/chap6.pdf
>>>>
>>>> Now here are the models that I fit with lmer.  Assume that the wide
>>>> version of the data is called "wide" and the long version "long".
>>>> -----------------------------
>>>>
>>>> y <- cbind(wide$SUCCESS, wide$FAIL)
>>>>
>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide)
>>>> summary(f1)
>>>>
>>>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>>>> family=binomial, data=wide) summary(f2)
>>>>
>>>> f3 <- lmer(y ~  ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>>>> | SOURCE), family=binomial, data=wide) summary(f3)
>>>>
>>>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>>>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>>>>
>>>> -------------------------------
>>>>
>>>> Models f1, f2, and f4 work and reproduce the results of Hox.  Model
>>>> f4 takes a hell of a long time to compute, but it seems to give the
>>>> expected results.  Model f3, which I assumed (wrongly) would be the
>>>> same as f4, does not work.  Instead, when it is run, you get the
>>>> error message:
>>>>
>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>
>>>> I guess the question that I have now is: did I do something wrong
>>>> with model f3 or is lmer doing something unusual?  My assumption that
>>>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>>>> describes the glm function for logistic regression.
>>>>
>>>> I very much appreciate any insight.
>>>>
>>>> Brant
>>>>
>>>> #####################################################################################
>>>>
>>>>
>>>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>>>> <Thierry.ONKELINX at inbo.be> wrote:
>>>>> Dear Brant,
>>>>>
>>>>> The model is too complex. You have maximum three observations for
>>>>> each level of the random effect. Allowing for a random intercept
>>>>> and two random slopes does not make much sense then. Does it?
>>>>>
>>>>> HTH,
>>>>>
>>>>> Thierry
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>  ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>> Research Institute for Nature and Forest Cel biometrie,
>>>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>>>
>>>>> To call in the statistician after the experiment is done may be no
>>>>> more than asking him to perform a post-mortem examination: he may
>>>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>>>> Fisher
>>>>>
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>
>>>>> The combination of some data and an aching desire for an answer
>>>>> does not ensure that a reasonable answer can be extracted from a
>>>>> given body of data. ~ John Tukey
>>>>>
>>>>> -----Oorspronkelijk bericht----- Van:
>>>>> r-sig-mixed-models-bounces at r-project.org
>>>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>>>> logistic regression
>>>>>
>>>>>
>>>>> lmer Experts:
>>>>>
>>>>> I am trying to use lmer to duplicate the results found in Joop
>>>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>>>> In chapter 6 of his book he shows an example of multilevel logistic
>>>>>  regression for a meta-analysis of survey response rates.  The data
>>>>> are available in the file "metaresp.xls" at his website:
>>>>>
>>>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>>>>
>>>>> The dataset includes the following variables of interest:
>>>>>
>>>>> Individual level (Level 1) variables: TELDUM         = telephone
>>>>> questioning MAILDUM  = mail questioning RESPONSE = the outcome of
>>>>> interest, the study response rate DENOM    = the number of people
>>>>> questioned
>>>>>
>>>>> Study/group level (Level 2) variables: SOURCE        = the study
>>>>> identifier YEAR      = year of study SALIENCY = how salient the
>>>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>>>> calculated
>>>>>
>>>>>
>>>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>>>>
>>>>> SUCCESS <- as.integer(RESPISRR*DENOM) y     <- cbind(SUCCESS,
>>>>> DENOM-SUCCESS)
>>>>>
>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>>>> family=binomial(link=logit))
>>>>>
>>>>>
>>>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>>>>
>>>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE),
>>>>> family=binomial(link=logit))
>>>>>
>>>>>
>>>>> He then says that these two Level 1 variables should be allowed to
>>>>>  vary across studies (varying slopes).  When I try to fit what I
>>>>> believe to be the correct model, I get an error
>>>>>
>>>>>
>>>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) +
>>>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>>>>
>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>
>>>>>
>>>>> Can anyone tell me what I am doing wrong here?  Thanks so much in
>>>>>  advance.
>>>>>
>>>>> Brant Inman Duke University Medical Center
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>>>> The views expressed in  this message and any annex are purely those
>>>>> of the writer and may not be regarded as stating an official
>>>>> position of INBO, as long as the message is not confirmed by a duly
>>>>>  signed document.
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>>--
>>>Ben Bolker
>>>Associate professor, Biology Dep't, Univ. of Florida
>>>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>