[R-sig-ME] Multilevel logistic regression

Mon Mar 23 18:40:32 CET 2009

If I am working with a windows version of R 2.8.1, where would the src folder be?

In C:\Program Files\R\R-2.8.1\library\lme4, which is my lme4 installation site, I have no folder called src.

Brant

------------------------------------------------

On Monday, March 23, 2009, at 11:45AM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>On Mon, Mar 23, 2009 at 10:05 AM, Brant Inman <brant.inman at me.com> wrote:
>
>> Thanks for everyone's explanations.  I truly appreciate it.  Since I have never really delved into S4 objects like lmer, I now have a reason now to crack open Chambers' green book and figure out how to track lmer down to its depths to modify the error message line.
>
>Actually you don't need to learn about S4 classes and methods to be
>able to bypass this error message.  If you search in the source code
>for a string that begins with "q = " you will find that the error
>message is generated in the C function called update_u in the
>lme4/src/lmer.c file.
>
>> This thread is why I think R is so great: as long as your question is not too terribly dumb you get advice from the experts!
>>
>> Cheers,
>>
>> Brant
>>
>> -----------------------------------------
>>
>> On Sunday, March 22, 2009, at 12:41PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>>On Sun, Mar 22, 2009 at 9:22 AM, Brant Inman <brant.inman at me.com> wrote:
>>>> Ben,
>>>
>>>> Thanks for confirming my findings. Two questions for you:
>>>
>>>> 1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?
>>>
>>>> f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
>>>> f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
>>>> f3 <- lmer( success ~ covariates + (1 | group), data=long ...
>>>
>>>The code in lmer to count the "number of observations" is too
>>>simplistic.  It uses the length of the vector of responses which is
>>>appropriate except for the case of models like f1.  What should be
>>>done in that case is the evaluate the sum of the number of cases
>>>represented by each row of the data.
>>>
>>>Tracking such things down is not terribly easy.  The "initialize"
>>>expression from a glm family is "not unlike" a gross hack. It is an
>>>expression (not a function) that is evaluated for the sole purpose of
>>>scattering a bunch of objects around the function evaluation
>>>environment, which often ends up looking like a teenager's bedroom
>>>afterwards.
>>>
>>>Yes, special purpose code could be written to detect a situation like
>>>model f1 and account for it appropriately.  Doing so will mean keeping
>>>track of two, possibly different, definitions of the "number of
>>>observations".  If someone is willing to produce a patch for the code
>>>I will consider it.
>>>
>>>I don't think it will be as easy to detect that model f2 represents
>>>more observations than are recorded in the data.  All you know is that
>>>you are given observation weights, which could be the result of
>>>effects other than multiple "observations" per observation.
>>>
>>>> 2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?
>>>
>>>Well, you start by grepping the source code to determine where the
>>>error message is produced then you disable that part of the code.
>>>
>>>The error message is there for a reason - primarily related to linear
>>>mixed models.  Sometimes people get carried away and define random
>>>effects with respect to a factor that has a distinct level for each
>>>observation.  You can estimate the parameters in such a model but not
>>>uniquely.  The variance due to such a "one level per observation"
>>>factor is completely confounded with the variance of the
>>>"per-observation" noise term.  People have reported results from such
>>>models fit by lme so I wanted to protect against that by flagging such
>>>models in lme4.  One could adopt a "caveat emptor" approach and say
>>>that the user is responsible for ensuring that they model that they
>>>fit is mathematically reasonable so it they get nonsense results for a
>>>nonsense model it's their fault.  Unfortunately, that ends up
>>>reflecting badly on lme4 or the R project.  People fail to distinguish
>>>between a package in R and R in general so this type of problem is
>>>reported as a flaw in R, often by people who want to convince
>>>companies to pony up large amounts of money for commercial software
>>>licenses.
>>>
>>>Hence, I have opted for the more conservative approach of throwing an
>>>error in such cases.   As Ben has pointed out, lme4 is open source so
>>>if you want to modify it to fit a particular type of model that
>>>currently produces an error, you have the opportunity of doing so.
>>>
>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
>>>>>   I tried this out with a hacked version of lme4 that removes
>>>>>the test on the number of observations, and get reasonable answers
>>>>>for f3, to wit:
>>>>>
>>>>>Random effects:
>>>>> Groups Name        Variance Std.Dev. Corr
>>>>> SOURCE (Intercept) 0.78845  0.88795
>>>>>        TELDUM      0.24216  0.49210  -0.281
>>>>>        MAILDUM     0.52215  0.72260  -0.365  0.317
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>Fixed effects:
>>>>>            Estimate Std. Error z value Pr(>|z|)
>>>>>(Intercept)  1.12762    0.19519   5.777  7.6e-09 ***
>>>>>RESPISRR     0.20714    0.21497   0.964 0.335258
>>>>>TELDUM      -0.20251    0.09141  -2.215 0.026737 *
>>>>>MAILDUM     -0.56094    0.14708  -3.814 0.000137 ***
>>>>>---
>>>>>
>>>>>  The random effects (esp for telephone/mail) are a little different.
>>>>>It doesn't look like the software used in the chapter estimates
>>>>>correlations among random effects?
>>>>>
>>>>>## Table 6.4 Models for response rates in different conditions
>>>>>## Fixed part           conditions fixed       conditions random
>>>>>##                      coeff. (s.e.)          coeff. (s.e.)
>>>>>## Predictor
>>>>>## intercept            0.90 (.14)             1.17 (.21)
>>>>>## resptype             0.53 (.06)             0.20 (.23)
>>>>>## telephone            -0.16 (.02)            -0.20 (.10)
>>>>>## mail                 -0.49 (.03)            -0.58 (.16)
>>>>>## Random part
>>>>>## intercept1           1.00                   1.00
>>>>>## intercept2           0.86 (.18)             0.87 (.20)
>>>>>## telephone                                   0.26 (.08)
>>>>>## mail                                        0.59 (.20)
>>>>>
>>>>>Eliminating the correlation among random effects:
>>>>>
>>>>>f3B <- lmer(y  ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>>>>>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
>>>>>                family=binomial, data=wide)
>>>>>summary(f3B)
>>>>>
>>>>>Random effects:
>>>>> Groups Name        Variance Std.Dev.
>>>>> SOURCE (Intercept) 0.72203  0.84972
>>>>> SOURCE TELDUM      0.23582  0.48562
>>>>> SOURCE MAILDUM     0.43923  0.66274
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>
>>>>>  telephone/mail random effects still estimated higher.
>>>>>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>>>>>
>>>>>Random effects:
>>>>> Groups Name        Variance Std.Dev.
>>>>> SOURCE (Intercept) 0.74275  0.86183
>>>>> SOURCE TELDUM      0.24565  0.49563
>>>>> SOURCE MAILDUM     0.28567  0.53448
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Brant Inman wrote:
>>>>>> Thierry et al:
>>>>>>
>>>>>> I think I solved my problem and it provides some insight into the way
>>>>>> lmer handles binomial data, so I will share the findings here. First,
>>>>>> I have made two datasets available on the web, a long format and a
>>>>>> wide format version of the "metaresp" data of Joop Hox..  They can be
>>>>>> found here
>>>>>>
>>>>>> http://www.duke.edu/~bi6/
>>>>>>
>>>>>> Hox has a draft of the chapter of interest that discusses the
>>>>>> metaresp dataset and the modeling process/problem that I am trying to
>>>>>> solve.  Note that the results that I am trying to reproduce are in
>>>>>> Tables 6.3 and 6.4.  The chapter can be found at:
>>>>>>
>>>>>> http://www.geocities.com/joophox/papers/chap6.pdf
>>>>>>
>>>>>> Now here are the models that I fit with lmer.  Assume that the wide
>>>>>> version of the data is called "wide" and the long version "long".
>>>>>> -----------------------------
>>>>>>
>>>>>> y <- cbind(wide$SUCCESS, wide$FAIL)
>>>>>>
>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide)
>>>>>> summary(f1)
>>>>>>
>>>>>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>>>>>> family=binomial, data=wide) summary(f2)
>>>>>>
>>>>>> f3 <- lmer(y ~  ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>>>>>> | SOURCE), family=binomial, data=wide) summary(f3)
>>>>>>
>>>>>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>>>>>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>>>>>>
>>>>>> -------------------------------
>>>>>>
>>>>>> Models f1, f2, and f4 work and reproduce the results of Hox.  Model
>>>>>> f4 takes a hell of a long time to compute, but it seems to give the
>>>>>> expected results.  Model f3, which I assumed (wrongly) would be the
>>>>>> same as f4, does not work.  Instead, when it is run, you get the
>>>>>> error message:
>>>>>>
>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>
>>>>>> I guess the question that I have now is: did I do something wrong
>>>>>> with model f3 or is lmer doing something unusual?  My assumption that
>>>>>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>>>>>> describes the glm function for logistic regression.
>>>>>>
>>>>>> I very much appreciate any insight.
>>>>>>
>>>>>> Brant
>>>>>>
>>>>>> #####################################################################################
>>>>>>
>>>>>>
>>>>>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>>>>>> <Thierry.ONKELINX at inbo.be> wrote:
>>>>>>> Dear Brant,
>>>>>>>
>>>>>>> The model is too complex. You have maximum three observations for
>>>>>>> each level of the random effect. Allowing for a random intercept
>>>>>>> and two random slopes does not make much sense then. Does it?
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> Thierry
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>  ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>> Research Institute for Nature and Forest Cel biometrie,
>>>>>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>>>>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>>>>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>>>>>
>>>>>>> To call in the statistician after the experiment is done may be no
>>>>>>> more than asking him to perform a post-mortem examination: he may
>>>>>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>>>>>> Fisher
>>>>>>>
>>>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>>>
>>>>>>> The combination of some data and an aching desire for an answer
>>>>>>> does not ensure that a reasonable answer can be extracted from a
>>>>>>> given body of data. ~ John Tukey
>>>>>>>
>>>>>>> -----Oorspronkelijk bericht----- Van:
>>>>>>> r-sig-mixed-models-bounces at r-project.org
>>>>>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>>>>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>>>>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>>>>>> logistic regression
>>>>>>>
>>>>>>>
>>>>>>> lmer Experts:
>>>>>>>
>>>>>>> I am trying to use lmer to duplicate the results found in Joop
>>>>>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>>>>>> In chapter 6 of his book he shows an example of multilevel logistic
>>>>>>>  regression for a meta-analysis of survey response rates.  The data
>>>>>>> are available in the file "metaresp.xls" at his website:
>>>>>>>
>>>>>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>>>>>>
>>>>>>> The dataset includes the following variables of interest:
>>>>>>>
>>>>>>> Individual level (Level 1) variables: TELDUM         = telephone
>>>>>>> questioning MAILDUM  = mail questioning RESPONSE = the outcome of
>>>>>>> interest, the study response rate DENOM    = the number of people
>>>>>>> questioned
>>>>>>>
>>>>>>> Study/group level (Level 2) variables: SOURCE        = the study
>>>>>>> identifier YEAR      = year of study SALIENCY = how salient the
>>>>>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>>>>>> calculated
>>>>>>>
>>>>>>>
>>>>>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>>>>>>
>>>>>>> SUCCESS <- as.integer(RESPISRR*DENOM) y     <- cbind(SUCCESS,
>>>>>>> DENOM-SUCCESS)
>>>>>>>
>>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>>>>>> family=binomial(link=logit))
>>>>>>>
>>>>>>>
>>>>>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>>>>>>
>>>>>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE),
>>>>>>> family=binomial(link=logit))
>>>>>>>
>>>>>>>
>>>>>>> He then says that these two Level 1 variables should be allowed to
>>>>>>>  vary across studies (varying slopes).  When I try to fit what I
>>>>>>> believe to be the correct model, I get an error
>>>>>>>
>>>>>>>
>>>>>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) +
>>>>>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>>>>>>
>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>>
>>>>>>>
>>>>>>> Can anyone tell me what I am doing wrong here?  Thanks so much in
>>>>>>>  advance.
>>>>>>>
>>>>>>> Brant Inman Duke University Medical Center
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>>>>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>>>>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>>>>>> The views expressed in  this message and any annex are purely those
>>>>>>> of the writer and may not be regarded as stating an official
>>>>>>> position of INBO, as long as the message is not confirmed by a duly
>>>>>>>  signed document.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>>
>>>>>--
>>>>>Ben Bolker
>>>>>Associate professor, Biology Dep't, Univ. of Florida
>>>>>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>
>
>