[R-sig-ME] Multilevel logistic regression

Mon Mar 23 18:56:22 CET 2009

Sorry for the dumb last question. I still find some of the more technical computer/programming aspects of R a bit confusing :)   

I just realized after reading Ewe Ligges article on source code in Rnews 6/4 that my copy of R is the binary copy and therefore does not allow me to access the stuff you said. I did download the package in source format and did find the appropriate error message that I could perhaps comment out.  I guess that I will need to reinstall R from source to get myself on the right track.

Brant

-------------------------------------------------

On Monday, March 23, 2009, at 12:40PM, "Brant Inman" <brant.inman at me.com> wrote:
>
>If I am working with a windows version of R 2.8.1, where would the src folder be?
>
>In C:\Program Files\R\R-2.8.1\library\lme4, which is my lme4 installation site, I have no folder called src.
>
>
>Brant
>
>------------------------------------------------
> 
>On Monday, March 23, 2009, at 11:45AM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>On Mon, Mar 23, 2009 at 10:05 AM, Brant Inman <brant.inman at me.com> wrote:
>>
>>> Thanks for everyone's explanations.  I truly appreciate it.  Since I have never really delved into S4 objects like lmer, I now have a reason now to crack open Chambers' green book and figure out how to track lmer down to its depths to modify the error message line.
>>
>>Actually you don't need to learn about S4 classes and methods to be
>>able to bypass this error message.  If you search in the source code
>>for a string that begins with "q = " you will find that the error
>>message is generated in the C function called update_u in the
>>lme4/src/lmer.c file.
>>
>>> This thread is why I think R is so great: as long as your question is not too terribly dumb you get advice from the experts!
>>>
>>> Cheers,
>>>
>>> Brant
>>>
>>> -----------------------------------------
>>>
>>> On Sunday, March 22, 2009, at 12:41PM, "Douglas Bates" <bates at stat.wisc.edu> wrote:
>>>>On Sun, Mar 22, 2009 at 9:22 AM, Brant Inman <brant.inman at me.com> wrote:
>>>>> Ben,
>>>>
>>>>> Thanks for confirming my findings. Two questions for you:
>>>>
>>>>> 1) Is it normal that lmer does NOT find the following 3 formulae to be equivalent?
>>>>
>>>>> f1 <- lmer( cbind(success, fail) ~ covariates + (1 | group), data=wide ...
>>>>> f2 <- lmer( (success/(success+fail)) ~ covariates + (1 | group), weights=(success+fail), data=wide ...
>>>>> f3 <- lmer( success ~ covariates + (1 | group), data=long ...
>>>>
>>>>The code in lmer to count the "number of observations" is too
>>>>simplistic.  It uses the length of the vector of responses which is
>>>>appropriate except for the case of models like f1.  What should be
>>>>done in that case is the evaluate the sum of the number of cases
>>>>represented by each row of the data.
>>>>
>>>>Tracking such things down is not terribly easy.  The "initialize"
>>>>expression from a glm family is "not unlike" a gross hack. It is an
>>>>expression (not a function) that is evaluated for the sole purpose of
>>>>scattering a bunch of objects around the function evaluation
>>>>environment, which often ends up looking like a teenager's bedroom
>>>>afterwards.
>>>>
>>>>Yes, special purpose code could be written to detect a situation like
>>>>model f1 and account for it appropriately.  Doing so will mean keeping
>>>>track of two, possibly different, definitions of the "number of
>>>>observations".  If someone is willing to produce a patch for the code
>>>>I will consider it.
>>>>
>>>>I don't think it will be as easy to detect that model f2 represents
>>>>more observations than are recorded in the data.  All you know is that
>>>>you are given observation weights, which could be the result of
>>>>effects other than multiple "observations" per observation.
>>>>
>>>>> 2) How did you hack lmer to avoid this problem and did it compute the result faster with the wide dataset than the long one?
>>>>
>>>>Well, you start by grepping the source code to determine where the
>>>>error message is produced then you disable that part of the code.
>>>>
>>>>The error message is there for a reason - primarily related to linear
>>>>mixed models.  Sometimes people get carried away and define random
>>>>effects with respect to a factor that has a distinct level for each
>>>>observation.  You can estimate the parameters in such a model but not
>>>>uniquely.  The variance due to such a "one level per observation"
>>>>factor is completely confounded with the variance of the
>>>>"per-observation" noise term.  People have reported results from such
>>>>models fit by lme so I wanted to protect against that by flagging such
>>>>models in lme4.  One could adopt a "caveat emptor" approach and say
>>>>that the user is responsible for ensuring that they model that they
>>>>fit is mathematically reasonable so it they get nonsense results for a
>>>>nonsense model it's their fault.  Unfortunately, that ends up
>>>>reflecting badly on lme4 or the R project.  People fail to distinguish
>>>>between a package in R and R in general so this type of problem is
>>>>reported as a flaw in R, often by people who want to convince
>>>>companies to pony up large amounts of money for commercial software
>>>>licenses.
>>>>
>>>>Hence, I have opted for the more conservative approach of throwing an
>>>>error in such cases.   As Ben has pointed out, lme4 is open source so
>>>>if you want to modify it to fit a particular type of model that
>>>>currently produces an error, you have the opportunity of doing so.
>>>>
>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> On Saturday, March 21, 2009, at 11:52PM, "Ben Bolker" <bolker at ufl.edu> wrote:
>>>>>>   I tried this out with a hacked version of lme4 that removes
>>>>>>the test on the number of observations, and get reasonable answers
>>>>>>for f3, to wit:
>>>>>>
>>>>>>Random effects:
>>>>>> Groups Name        Variance Std.Dev. Corr
>>>>>> SOURCE (Intercept) 0.78845  0.88795
>>>>>>        TELDUM      0.24216  0.49210  -0.281
>>>>>>        MAILDUM     0.52215  0.72260  -0.365  0.317
>>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>>
>>>>>>Fixed effects:
>>>>>>            Estimate Std. Error z value Pr(>|z|)
>>>>>>(Intercept)  1.12762    0.19519   5.777  7.6e-09 ***
>>>>>>RESPISRR     0.20714    0.21497   0.964 0.335258
>>>>>>TELDUM      -0.20251    0.09141  -2.215 0.026737 *
>>>>>>MAILDUM     -0.56094    0.14708  -3.814 0.000137 ***
>>>>>>---
>>>>>>
>>>>>>  The random effects (esp for telephone/mail) are a little different.
>>>>>>It doesn't look like the software used in the chapter estimates
>>>>>>correlations among random effects?
>>>>>>
>>>>>>## Table 6.4 Models for response rates in different conditions
>>>>>>## Fixed part           conditions fixed       conditions random
>>>>>>##                      coeff. (s.e.)          coeff. (s.e.)
>>>>>>## Predictor
>>>>>>## intercept            0.90 (.14)             1.17 (.21)
>>>>>>## resptype             0.53 (.06)             0.20 (.23)
>>>>>>## telephone            -0.16 (.02)            -0.20 (.10)
>>>>>>## mail                 -0.49 (.03)            -0.58 (.16)
>>>>>>## Random part
>>>>>>## intercept1           1.00                   1.00
>>>>>>## intercept2           0.86 (.18)             0.87 (.20)
>>>>>>## telephone                                   0.26 (.08)
>>>>>>## mail                                        0.59 (.20)
>>>>>>
>>>>>>Eliminating the correlation among random effects:
>>>>>>
>>>>>>f3B <- lmer(y  ~ RESPISRR + TELDUM + MAILDUM + (1|SOURCE) +
>>>>>>(0+TELDUM|SOURCE) + (0+MAILDUM|SOURCE),
>>>>>>                family=binomial, data=wide)
>>>>>>summary(f3B)
>>>>>>
>>>>>>Random effects:
>>>>>> Groups Name        Variance Std.Dev.
>>>>>> SOURCE (Intercept) 0.72203  0.84972
>>>>>> SOURCE TELDUM      0.23582  0.48562
>>>>>> SOURCE MAILDUM     0.43923  0.66274
>>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>>
>>>>>>
>>>>>>  telephone/mail random effects still estimated higher.
>>>>>>Tried nAGQ=6 (crank up Gauss-Hermite quadrature):
>>>>>>
>>>>>>Random effects:
>>>>>> Groups Name        Variance Std.Dev.
>>>>>> SOURCE (Intercept) 0.74275  0.86183
>>>>>> SOURCE TELDUM      0.24565  0.49563
>>>>>> SOURCE MAILDUM     0.28567  0.53448
>>>>>>Number of obs: 105, groups: SOURCE, 48
>>>>>>
>>>>>>Brings mail RE down (*below* PQL est.) but telephone RE is still high.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>Brant Inman wrote:
>>>>>>> Thierry et al:
>>>>>>>
>>>>>>> I think I solved my problem and it provides some insight into the way
>>>>>>> lmer handles binomial data, so I will share the findings here. First,
>>>>>>> I have made two datasets available on the web, a long format and a
>>>>>>> wide format version of the "metaresp" data of Joop Hox..  They can be
>>>>>>> found here
>>>>>>>
>>>>>>> http://www.duke.edu/~bi6/
>>>>>>>
>>>>>>> Hox has a draft of the chapter of interest that discusses the
>>>>>>> metaresp dataset and the modeling process/problem that I am trying to
>>>>>>> solve.  Note that the results that I am trying to reproduce are in
>>>>>>> Tables 6.3 and 6.4.  The chapter can be found at:
>>>>>>>
>>>>>>> http://www.geocities.com/joophox/papers/chap6.pdf
>>>>>>>
>>>>>>> Now here are the models that I fit with lmer.  Assume that the wide
>>>>>>> version of the data is called "wide" and the long version "long".
>>>>>>> -----------------------------
>>>>>>>
>>>>>>> y <- cbind(wide$SUCCESS, wide$FAIL)
>>>>>>>
>>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE), family=binomial, data=wide)
>>>>>>> summary(f1)
>>>>>>>
>>>>>>> f2 <- lmer(y ~ RESPISRR + TELDUM + MAILDUM + (1 | SOURCE),
>>>>>>> family=binomial, data=wide) summary(f2)
>>>>>>>
>>>>>>> f3 <- lmer(y ~  ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM + MAILDUM
>>>>>>> | SOURCE), family=binomial, data=wide) summary(f3)
>>>>>>>
>>>>>>> f4 <- lmer(SUCCESS ~ RESPISRR + TELDUM + MAILDUM + (1 + TELDUM +
>>>>>>> MAILDUM | SOURCE), family=binomial, data=long) summary(f4)
>>>>>>>
>>>>>>> -------------------------------
>>>>>>>
>>>>>>> Models f1, f2, and f4 work and reproduce the results of Hox.  Model
>>>>>>> f4 takes a hell of a long time to compute, but it seems to give the
>>>>>>> expected results.  Model f3, which I assumed (wrongly) would be the
>>>>>>> same as f4, does not work.  Instead, when it is run, you get the
>>>>>>> error message:
>>>>>>>
>>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>>
>>>>>>> I guess the question that I have now is: did I do something wrong
>>>>>>> with model f3 or is lmer doing something unusual?  My assumption that
>>>>>>> models f3 and f4 were the same comes from MASS4 p190 where Ripley
>>>>>>> describes the glm function for logistic regression.
>>>>>>>
>>>>>>> I very much appreciate any insight.
>>>>>>>
>>>>>>> Brant
>>>>>>>
>>>>>>> #####################################################################################
>>>>>>>
>>>>>>>
>>>>>>> On Friday, March 20, 2009, at 04:32AM, "ONKELINX, Thierry"
>>>>>>> <Thierry.ONKELINX at inbo.be> wrote:
>>>>>>>> Dear Brant,
>>>>>>>>
>>>>>>>> The model is too complex. You have maximum three observations for
>>>>>>>> each level of the random effect. Allowing for a random intercept
>>>>>>>> and two random slopes does not make much sense then. Does it?
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Thierry
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>  ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
>>>>>>>> Research Institute for Nature and Forest Cel biometrie,
>>>>>>>> methodologie en kwaliteitszorg / Section biometrics, methodology
>>>>>>>> and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>>>>>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>>>>>>
>>>>>>>> To call in the statistician after the experiment is done may be no
>>>>>>>> more than asking him to perform a post-mortem examination: he may
>>>>>>>> be able to say what the experiment died of. ~ Sir Ronald Aylmer
>>>>>>>> Fisher
>>>>>>>>
>>>>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>>>>
>>>>>>>> The combination of some data and an aching desire for an answer
>>>>>>>> does not ensure that a reasonable answer can be extracted from a
>>>>>>>> given body of data. ~ John Tukey
>>>>>>>>
>>>>>>>> -----Oorspronkelijk bericht----- Van:
>>>>>>>> r-sig-mixed-models-bounces at r-project.org
>>>>>>>> [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Brant
>>>>>>>> Inman Verzonden: donderdag 19 maart 2009 3:11 Aan:
>>>>>>>> r-sig-mixed-models at r-project.org Onderwerp: [R-sig-ME] Multilevel
>>>>>>>> logistic regression
>>>>>>>>
>>>>>>>>
>>>>>>>> lmer Experts:
>>>>>>>>
>>>>>>>> I am trying to use lmer to duplicate the results found in Joop
>>>>>>>> Hox's book "Multilevel Analysis: technique and applications" 2002.
>>>>>>>> In chapter 6 of his book he shows an example of multilevel logistic
>>>>>>>>  regression for a meta-analysis of survey response rates.  The data
>>>>>>>> are available in the file "metaresp.xls" at his website:
>>>>>>>>
>>>>>>>> <http://www.geocities.com/joophox/mlbook/excelxls.zip>
>>>>>>>>
>>>>>>>> The dataset includes the following variables of interest:
>>>>>>>>
>>>>>>>> Individual level (Level 1) variables: TELDUM         = telephone
>>>>>>>> questioning MAILDUM  = mail questioning RESPONSE = the outcome of
>>>>>>>> interest, the study response rate DENOM    = the number of people
>>>>>>>> questioned
>>>>>>>>
>>>>>>>> Study/group level (Level 2) variables: SOURCE        = the study
>>>>>>>> identifier YEAR      = year of study SALIENCY = how salient the
>>>>>>>> questionnaire was (0 to 2) RESPISRR = the way the response rate was
>>>>>>>> calculated
>>>>>>>>
>>>>>>>>
>>>>>>>> The null model (Table 6.2) proposed by Joop is easy to fit:
>>>>>>>>
>>>>>>>> SUCCESS <- as.integer(RESPISRR*DENOM) y     <- cbind(SUCCESS,
>>>>>>>> DENOM-SUCCESS)
>>>>>>>>
>>>>>>>> f1 <- lmer(y ~ RESPISRR + (1 | SOURCE),
>>>>>>>> family=binomial(link=logit))
>>>>>>>>
>>>>>>>>
>>>>>>>> Joop then adds a couple Level 1 variables (Table 6.3):
>>>>>>>>
>>>>>>>> f2 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (1 | SOURCE),
>>>>>>>> family=binomial(link=logit))
>>>>>>>>
>>>>>>>>
>>>>>>>> He then says that these two Level 1 variables should be allowed to
>>>>>>>>  vary across studies (varying slopes).  When I try to fit what I
>>>>>>>> believe to be the correct model, I get an error
>>>>>>>>
>>>>>>>>
>>>>>>>> f3 <- lmer(y ~ RESPISRR + TELNUM + MAILDUM + (TELNUM | SOURCE) +
>>>>>>>> (MAILDUM | SOURCE) + (1 | SOURCE), family=binomial(link=logit))
>>>>>>>>
>>>>>>>> Error in mer_finalize(ans) : q = 240 > n = 105
>>>>>>>>
>>>>>>>>
>>>>>>>> Can anyone tell me what I am doing wrong here?  Thanks so much in
>>>>>>>>  advance.
>>>>>>>>
>>>>>>>> Brant Inman Duke University Medical Center
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>>
>>>>>>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>>>>>>> schrijver weer en binden het INBO onder geen enkel beding, zolang
>>>>>>>> dit bericht niet bevestigd is door een geldig ondertekend document.
>>>>>>>> The views expressed in  this message and any annex are purely those
>>>>>>>> of the writer and may not be regarded as stating an official
>>>>>>>> position of INBO, as long as the message is not confirmed by a duly
>>>>>>>>  signed document.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>>
>>>>>>--
>>>>>>Ben Bolker
>>>>>>Associate professor, Biology Dep't, Univ. of Florida
>>>>>>bolker at ufl.edu / www.zoology.ufl.edu/bolker
>>>>>>GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>