[R-sig-ME] Multilevel logistic regression
Brant Inman
brant.inman at me.com
Mon Mar 23 18:40:32 CET 2009
If I am working with a windows version of R 2.8.1, where would the src folder be?
In C:\Program Files\R\R-2.8.1\library\lme4, which is my lme4 installation site, I have no folder called src.
Brant
------------------------------------------------
On Monday, March 23, 2009, at 11:45AM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>On Mon, Mar 23, 2009 at 10:05 AM, Brant Inman <brant.inman at me.com> wrote:
>
>> Thanks for everyone's explanations. I truly appreciate it. Since I have never really delved into S4 objects like lmer, I now have a reason now to crack open Chambers' green book and figure out how to track lmer down to its depths to modify the error message line.
>
>Actually you don't need to learn about S4 classes and methods to be
>able to bypass this error message. If you search in the source code
>for a string that begins with "q = " you will find that the error
>message is generated in the C function called update_u in the
>lme4/src/lmer.c file.
>
>> This thread is why I think R is so great: as long as your question is not too terribly dumb you get advice from the experts!
>>
>> Cheers,
>>
>> Brant
>>
>> -----------------------------------------
>>
>> On Sunday, March 22, 2009, at 12:41PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>>On Sun, Mar 22, 2009 at 9:22 AM, Brant Inman <brant.inman at me.com> wrote:
>>>> Ben,
>>>
>>>> Thanks for confirming my findings. Two questions for you:
>>>
>>>> 1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?
>>>
>>>> f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
>>>> f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
>>>> f3 <- lmer( success ~ covariates + (1 | group), data=long ...
>>>
>>>The code in lmer to count the "number of observations" is too
>>>simplistic. It uses the length of the vector of responses which is
>>>appropriate except for the case of models like f1. What should be
>>>done in that case is the evaluate the sum of the number of cases
>>>represented by each row of the data.
>>>
>>>Tracking such things down is not terribly easy. The "initialize"
>>>expression from a glm family is "not unlike" a gross hack. It is an
>>>expression (not a function) that is evaluated for the sole purpose of
>>>scattering a bunch of objects around the function evaluation
>>>environment, which often ends up looking like a teenager's bedroom
>>>afterwards.
>>>
>>>Yes, special purpose code could be written to detect a situation like
>>>model f1 and account for it appropriately. Doing so will mean keeping
>>>track of two, possibly different, definitions of the "number of
>>>observations". If someone is willing to produce a patch for the code
>>>I will consider it.
>>>
>>>I don't think it will be as easy to detect that model f2 represents
>>>more observations than are recorded in the data. All you know is that
>>>you are given observation weights, which could be the result of
>>>effects other than multiple "observations" per observation.
>>>
>>>> 2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?
>>>
>>>Well, you start by grepping the source code to determine where the
>>>error message is produced then you disable that part of the code.
>>>
>>>The error message is there for a reason - primarily related to linear
>>>mixed models. Sometimes people get carried away and define random
>>>effects with respect to a factor that has a distinct level for each
>>>observation. You can estimate the parameters in such a model but not
>>>uniquely. The variance due to such a "one level per observation"
>>>factor is completely confounded with the variance of the
>>>"per-observation" noise term. People have reported results from such
>>>models fit by lme so I wanted to protect against that by flagging such
>>>models in lme4. One could adopt a "caveat emptor" approach and say
>>>that the user is responsible for ensuring that they model that they
>>>fit is mathematically reasonable so it they get nonsense results for a
>>>nonsense model it's their fault. Unfortunately, that ends up
>>>reflecting badly on lme4 or the R project. People fail to distinguish
>>>between a package in R and R in general so this type of problem is
>>>reported as a flaw in R, often by people who want to convince
>>>companies to pony up large amounts of money for commercial software
>>>licenses.
>>>
>>>Hence, I have opted for the more conservative approach of throwing an
>>>error in such cases. As Ben has pointed out, lme4 is open source so
>>>if you want to modify it to fit a particular type of model that
>>>currently produces an error, you have the opportunity of doing so.
>>>
>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
>>>>> I tried this out with a hacked version of lme4 that removes
>>>>>the test on the number of observations, and get reasonable answers
>>>>>for f3, to wit:
>>>>>
>>>>>Random effects:
>>>>> Groups Name Variance Std.Dev. Corr
>>>>> SOURCE (Intercept) 0.78845 0.88795
>>>>> TELDUM 0.24216 0.49210 -0.281
>>>>> MAILDUM 0.52215 0.72260 -0.365 0.317
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>Fixed effects:
>>>>> Estimate Std. Error z value Pr(>|z|)
>>>>>(Intercept) 1.12762 0.19519 5.777 7.6e-09 ***
>>>>>RESPISRR 0.20714 0.21497 0.964 0.335258
>>>>>TELDUM -0.20251 0.09141 -2.215 0.026737 *
>>>>>MAILDUM -0.56094 0.14708 -3.814 0.000137 ***
>>>>>---
>>>>>
>>>>> The random effects (esp for telephone/mail) are a little different.
>>>>>It doesn't look like the software used in the chapter estimates
>>>>>correlations among random effects?
>>>>>
>>>>>## Table 6.4 Models for response rates in different conditions
>>>>>## Fixed part conditions fixed conditions random
>>>>>## coeff. (s.e.) coeff. (s.e.)
>>>>>## Predictor
>>>>>## intercept 0.90 (.14) 1.17 (.21)
>>>>>## resptype 0.53 (.06) 0.20 (.23)
>>>>>## telephone -0.16 (.02) -0.20 (.10)
>>>>>## mail -0.49 (.03) -0.58 (.16)
>>>>>## Random part
>>>>>## intercept1 1.00 1.00
>>>>>## intercept2 0.86 (.18) 0.87 (.20)
>>>>>## telephone 0.26 (.08)
>>>>>## mail 0.59 (.20)
>>>>>
>>>>>Eliminating the correlation among random effects:
>>>>>
>>>>>f3B <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>>>>>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
>>>>> family=binomial, data=wide)
>>>>>summary(f3B)
>>>>>
>>>>>Random effects:
>>>>> Groups Name Variance Std.Dev.
>>>>> SOURCE (Intercept) 0.72203 0.84972
>>>>> SOURCE TELDUM 0.23582 0.48562
>>>>> SOURCE MAILDUM 0.43923 0.66274
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>
>>>>> telephone/mail random effects still estimated higher.
>>>>>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>>>>>
>>>>>Random effects:
>>>>> Groups Name Variance Std.Dev.
>>>>> SOURCE (Intercept) 0.74275 0.86183
>>>>> SOURCE TELDUM 0.24565 0.49563
>>>>> SOURCE MAILDUM 0.28567 0.53448
>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>
>>>>>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Brant Inman wrote:
>>>>>> Thierry et al:
>>>>>>
>>>>>> I think I solved my problem and it provides some insight into the way
>>>>>> lmer handles binomial data, so I will share the findings here. First,
>>>>>> I have made two datasets available on the web, a long format and a
>>>>>> wide format version of the "metaresp" data of Joop Hox.. They can be
>>>>>> found here
>>>>>>
>>>>>> http://www.duke.edu/~bi6/
>>>>>>
>>>>>> Hox has a draft of the chapter of interest that discusses the
>>>>>> metaresp dataset and the modeling process/problem that I am trying to
>>>>>> solve. Note that the results that I am trying to reproduce are in
>>>>>> Tables 6.3 and 6.4. The chapter can be found at:
>>>>>>
>>>>>> http://www.geocities.com/joophox/papers/chap6.pdf
>>>>>>
>>>>>> Now here are the models that I fit with lmer. Assume that the wide
>>>>>> version of the data is called "wide" and the long version "long".
>>>>>> -----------------------------
>>>>>>
>>>>>> y <- cbind(wide$SUCCESS, wide$FAIL)
>>>>>>
>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide)
>>>>>> summary(f1)
>>>>>>
>>>>>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>>>>>> family=binomial, data=wide) summary(f2)
>>>>>>
>>>>>> f3 <- lmer(y ~ ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>>>>>> | SOURCE), family=binomial, data=wide) summary(f3)
>>>>>>
>>>>>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>>>>>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>>>>>>
>>>>>> -------------------------------
>>>>>>
>>>>>> Models f1, f2, and f4 work and reproduce the results of Hox. Model
>>>>>> f4 takes a hell of a long time to compute, but it seems to give the
>>>>>> expected results. Model f3, which I assumed (wrongly) would be the
>>>>>> same as f4, does not work. Instead, when it is run, you get the
>>>>>> error message:
>>>>>>
>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>
>>>>>> I guess the question that I have now is: did I do something wrong
>>>>>> with model f3 or is lmer doing something unusual? My assumption that
>>>>>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>>>>>> describes the glm function for logistic regression.
>>>>>>
>>>>>> I very much appreciate any insight.
>>>>>>
>>>>>> Brant
>>>>>>
>>>>>> #####################################################################################
>>>>>>
>>>>>>
>>>>>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>>>>>> <Thierry.ONKELINX at inbo.be> wrote:
>>>>>>> Dear Brant,
>>>>>>>
>>>>>>> The model is too complex. You have maximum three observations for
>>>>>>> each level of the random effect. Allowing for a random intercept
>>>>>>> and two random slopes does not make much sense then. Does it?
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> Thierry
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>> Research Institute for Nature and Forest Cel biometrie,
>>>>>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>>>>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>>>>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>>>>>
>>>>>>> To call in the statistician after the experiment is done may be no
>>>>>>> more than asking him to perform a post-mortem examination: he may
>>>>>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>>>>>> Fisher
>>>>>>>
>>>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>>>
>>>>>>> The combination of some data and an aching desire for an answer
>>>>>>> does not ensure that a reasonable answer can be extracted from a
>>>>>>> given body of data. ~ John Tukey
>>>>>>>
>>>>>>> -----Oorspronkelijk bericht----- Van:
>>>>>>> r-sig-mixed-models-bounces at r-project.org
>>>>>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>>>>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>>>>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>>>>>> logistic regression
>>>>>>>
>>>>>>>
>>>>>>> lmer Experts:
>>>>>>>
>>>>>>> I am trying to use lmer to duplicate the results found in Joop
>>>>>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>>>>>> In chapter 6 of his book he shows an example of multilevel logistic
>>>>>>> regression for a meta-analysis of survey response rates. The data
>>>>>>> are available in the file "metaresp.xls" at his website:
>>>>>>>
>>>>>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>>>>>>
>>>>>>> The dataset includes the following variables of interest:
>>>>>>>
>>>>>>> Individual level (Level 1) variables: TELDUM = telephone
>>>>>>> questioning MAILDUM = mail questioning RESPONSE = the outcome of
>>>>>>> interest, the study response rate DENOM = the number of people
>>>>>>> questioned
>>>>>>>
>>>>>>> Study/group level (Level 2) variables: SOURCE = the study
>>>>>>> identifier YEAR = year of study SALIENCY = how salient the
>>>>>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>>>>>> calculated
>>>>>>>
>>>>>>>
>>>>>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>>>>>>
>>>>>>> SUCCESS <- as.integer(RESPISRR*DENOM) y <- cbind(SUCCESS,
>>>>>>> DENOM-SUCCESS)
>>>>>>>
>>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>>>>>> family=binomial(link=logit))
>>>>>>>
>>>>>>>
>>>>>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>>>>>>
>>>>>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE),
>>>>>>> family=binomial(link=logit))
>>>>>>>
>>>>>>>
>>>>>>> He then says that these two Level 1 variables should be allowed to
>>>>>>> vary across studies (varying slopes). When I try to fit what I
>>>>>>> believe to be the correct model, I get an error
>>>>>>>
>>>>>>>
>>>>>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) +
>>>>>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>>>>>>
>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>>
>>>>>>>
>>>>>>> Can anyone tell me what I am doing wrong here? Thanks so much in
>>>>>>> advance.
>>>>>>>
>>>>>>> Brant Inman Duke University Medical Center
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>>>>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>>>>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>>>>>> The views expressed in this message and any annex are purely those
>>>>>>> of the writer and may not be regarded as stating an official
>>>>>>> position of INBO, as long as the message is not confirmed by a duly
>>>>>>> signed document.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>>
>>>>>--
>>>>>Ben Bolker
>>>>>Associate professor, Biology Dep't, Univ. of Florida
>>>>>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>
>
>
More information about the R-sig-mixed-models
mailing list