[R] 95% confidence intercal with glm

Ben Bolker bbolker at gmail.com
Wed Sep 29 18:30:51 CEST 2010


  [I'm a little confused: are you "Sam Smith" or "Chris Mcowen" ... ?]

  This is admittedly a bit confusing, but the best scale on which to
compute standard errors is the link scale.
   It turns out (I hadn't realized this) that predict.glm does give
you not-crazy answers when you ask for
se.fit=TRUE on the response scale, in which case you can (a)  just
take +/- 1.96 SE around the response-scale fit and be done.
However, these are less accurate than doing what I suggested ((b)
computing fit +/- 1.96 SE on the link scale and
inverse-link-transforming).  Among other things, method "a" produces
confidence intervals that are exactly symmetric (generally
not true) and that are not necessarily bounded within the appropriate
range -- for example, the lower 95% CI bound for females
in the example below is slightly < 0 ...

## extended example from ?predict.glm
require(graphics)

## example from Venables and Ripley (2002, pp. 190-2.)
ldose <- rep(0:5, 2)
numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex <- factor(rep(c("M", "F"), c(6, 6)))
SF <- cbind(numdead, numalive=20-numdead)
budworm.lg <- glm(SF ~ sex*ldose, family=binomial)
summary(budworm.lg)

ci.scale <- 1.96
plot(c(1,32), c(0,1), type = "n", xlab = "dose",
     ylab = "prob", log = "x", las=1, bty="l")
text(2^ldose, numdead/20, as.character(sex), col=c("red","black")[sex])
ld <- seq(0, 5, 0.1)
pM <- predict(budworm.lg,
              data.frame(ldose=ld,
                         sex=factor(rep("M", length(ld)),
                           levels=levels(sex))),
              type = "response",se.fit=TRUE)
lines(2^ld, pM$fit )
matlines(2^ld,pM$fit+outer(pM$se.fit,ci.scale*c(-1,1)),lty=2,col=1)
pF <- predict(budworm.lg,
              data.frame(ldose=ld,
                         sex=factor(rep("F", length(ld)),
                           levels=levels(sex))),
              type = "response",se.fit=TRUE)
lines(2^ld, pF$fit,col=2)
matlines(2^ld,pF$fit+outer(pF$se.fit,ci.scale*c(-1,1)),lty=2,col=2)

pM_ex <- predict(budworm.lg,
              data.frame(ldose=ld,
                         sex=factor(rep("M", length(ld)),
                           levels=levels(sex))),
              type = "link",se.fit=TRUE)
with(pM_ex,
    
matlines(2^ld,plogis(fit+outer(se.fit,ci.scale*c(-1,1))),lty=3,col=1))

pF_ex <- predict(budworm.lg,
              data.frame(ldose=ld,
                         sex=factor(rep("F", length(ld)),
                           levels=levels(sex))),
              type = "link",se.fit=TRUE)
with(pF_ex,
    
matlines(2^ld,plogis(fit+outer(se.fit,ci.scale*c(-1,1))),lty=3,col=2))

legend("bottomright",
       c("prediction",
         "+/- 2 SE (response scale)",
         "+/- 2 SE (link scale)"),
       lty=1:3)
      



On 10-09-29 10:38 AM, Chris Mcowen wrote:
> Right, that makes sense, thanks
>
> The reason i used type= response was i wanted to convert the predicted
probabilities to the response scale, as surely this is the scale at
which a 95CI value is most useful for?
>
> I.e
>
>>> pp <- predict(model1,se.fit=TRUE, type = "response")
>
>
> 1  0.68
>
> Probability of point 1 having a response of 1 = 0.68 - # this is above
the cutoff therefore this has a response of 1
>
> Then it would be of most use to get the 95CI on this scale to see if
the probability straddles the cutoff value?
>
> Maybe i am missing something?
>
> Thanks
>
> 
> On 29 Sep 2010, at 15:27, Ben Bolker wrote:
>
> On 10-09-29 10:04 AM, Sam wrote:
>> Hi Ben and list,
>>
>> Sorry to be a pain! I have followed your code, and modified it -
>>
>
>  **You should not use type="response" here.**
>  The point is that the (symmetric) confidence intervals are computed on
> the link/linear predictor
> scale, and then inverse-link-transformed (in this case, along with the
> fitted values) -- they then
> become asymmetric.
>
>>> pp <- predict(model1,se.fit=TRUE, type = "response")
>>>> etaframe <-
>>> + with(pp,cbind(fit,lower=fit-1.96*se.fit,upper=fit+1.96*se.fit))
>>>> pframe <- plogis(etaframe)
>>>> pframe
>>
>> My response variable is 0 or 1, the probabilities are all high above my
> cut-off point of 0.445, i think this may have something to do with
>>
>>> you may need to multiply by the binomial N as
>>> appropriate.
>>
>> However how do i multiply if my response is 0 or 1?
>>
>
>  if 'size=1' (i.e. Bernoulli trials) then you would be multiplying by
> 1, so don't bother.
>
>>
>> On 29 Sep 2010, at 13:44, Ben Bolker wrote:
>>
>>
>> ## from ?glm
>> counts <- c(18,17,15,20,10,20,25,13,12)
>> outcome <- gl(3,1,9)
>> treatment <- gl(3,3)
>> d.AD <- data.frame(treatment, outcome, counts)
>> glm.D93 <- glm(counts ~ outcome + treatment, family=poisson,
>>             data=d.AD)
>>
>> ## predict on 'link' or 'linear predictor' scale, with SEs
>> pp <- predict(glm.D93,se.fit=TRUE)
>> etaframe <-
>> with(pp,cbind(fit,lower=fit-1.96*se.fit,upper=fit+1.96*se.fit))
>> pframe <- exp(etaframe)  ## inverse-link
>> ## picture
>> pframe <- as.data.frame(cbind(obs=d.AD$counts,pframe))
>> library(plotrix)
>> plot(pframe$obs,ylim=c(5,30))
>> with(pframe,plotCI(1:9,fit,li=lower,ui=upper,col=2,add=TRUE))
>>
>> If you're using a binomial model you need 'plogis' rather than 'exp'
>> as your
>> inverse link, and you may need to multiply by the binomial N as
>> appropriate.
>>
>> On 10-09-29 06:07 AM, Sam wrote:
>>> I am looking to do the same but am a bit confused
>>
>>>> and apply the inverse link function for your model.
>>
>>> i understand up to this point and i understand what this means,
>>> however i am unsure why it needs to be done and how you do it - i.e
>>> i use family="binomial" is this wrong if i use this method to
>>> calculate 95% CI?
>>
>>> Thanks
>>
>>> Sam On 28 Sep 2010, at 14:50, Ben Bolker wrote:
>>
>>> zozio32 <remy.pascal <at> gmail.com> writes:
>>
>>>>
>>>>
>>>> Hi
>>>>
>>>> I had to use a glm instead of my basic lm on some data due to
>>>> unconstant variance.
>>>>
>>>> now, when I plot the model over the data, how can I easily get
>>>> the 95% confidence interval that sormally coming from:
>>>>
>>>>> yv <- predict(modelVar,list(aveLength=xv),int="c")
>>>>> matlines(xv,yv,lty=c(1,2,2))
>>>>
>>>> There is no "interval" argument to pass to the predict function
>>>> when using a glm, so I was wondering if I had to use an other
>>>> function
>>>>
>>
>>> You need to use predict with se=TRUE; construct the confidence
>>> intervals by computing predicted values +- 1.96 times the standard
>>> error returned; and apply the inverse link function for your
>>> model.
>>
>>> If heteroscedasticity is your main problem, and not a specific
>>> (known) non-normal distribution, you might consider using the gls
>>> function from the nlme package with an appropriate 'weights'
>>> argument.
>>
>>> ______________________________________________ R-help at r-project.org
>>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>> read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list