[R-sig-ME] What to do when a subset of binomial data has only positive outcomes

Timothy_Handley at nps.gov Timothy_Handley at nps.gov
Fri Jul 30 16:28:28 CEST 2010


I had this problem with glm. My solution was to create a likelihood
function, and use the optim or optimize function to find an appropriate
confidence interval. More specifically, I found the likelihood of the data
for the best-estimate parameters, then found the range of parameters for
which a likelihood-ratio-test (best-estimate params vs. current params) had
a p-value>.025. The range is then the 95% confidence interval for those
parameters. This is essentially replicating the functionality of glm by
hand, as it seemed unable to deal with data which is all positive or all
negative.

While I understand how to do this with glm, I don't understand enough about
fixed effects models to do the same for lmer. If you were willing to treat
everything as a fixed effect and switch to glm, then you could e-mail me,
and I could offer some more specific advice on how to do this.

If anyone else has a better solution, I'd be very interested to hear it.

Tim Handley
Fire Effects Monitor
Santa Monica Mountains National Recreation Area
401 W. Hillcrest Dr.
Thousand Oaks, CA 91360
805-370-2300 x2412

Date: Thu, 29 Jul 2010 17:57:35 -0700
From: Xiao He <praguewatermelon at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] What to do when a subset of binomial data has only
             positive outcomes
Message-ID:
             <AANLkTi=zskcerd9fuELVQZ833mYPRkCddYpRUdVvQq7- at mail.gmail.com>
Content-Type: text/plain

Dear R users and experts,

I tried to fit a model as shown below:

data.lmer<-lmer(word1~compoundType*nativelanguage + (1|subject) + (1|word),
data=data, family="binomial").

the factor 'compoundType' has three levels: blue, green, red.
the factor nativelanguage has two levels: english, other

I obtained the following results:
************************************************
Fixed effects:
                              Estimate Std. Error z value Pr(>|z|)
(Intercept)                    -1.0190     0.5777  -1.764   0.0778 .
nativelanguageother            -0.6862     0.4075  -1.684   0.0922 .
typegreen                      19.7882   728.0627   0.027   0.9783
typered                         3.6985     0.8359   4.425 9.65e-06 ***
nativelanguageother:typegreen -16.1915   728.0624  -0.022   0.9823
nativelanguageother:typered    -2.8106     0.5639  -4.984 6.23e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
************************************************

As you can see, the standard errors and estimates for "typegreen" as well
as
"nativelanguageother:typegreen" are enormous. I examined my data and
realized that the responses obtained for the level "green" of compoundType
were all positive. But the responses of "green" were clearly significantly
different from the base level "blue" as green was 100% positive responses,
whereas blue had only 30% of positive responses.

I wonder if there is any way to conduct statistical analyses to show that
green is significantly different from blue (and potentially red) because
obviously I can't just say they are different :)...

Thank you in advance for your help!



Best,

Xiao He
Graduate student
Department of Linguistics
University of Southern California


More information about the R-sig-mixed-models mailing list