[R] ggplot2: stat_smooth for family=binomial with cbind(Y, N) formula

Michael Friendly friendly at yorku.ca
Tue Dec 17 19:42:21 CET 2013


Thanks very much for this helpful reply, Thierry

Using aes(weight=trials) in stat_smooth() was part of what I was missing 
and solves my main
question.
However, for this data, I want to show the extrapolated prediction over 
a wider range than in
the data.  Adding xlim() doesn't help here-- the plot annotations are 
cut off at the lowest
value of Temperature in the data. Is there another way?

ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
     xlim(30, 81) +
     geom_point() +
     geom_smooth(method = "glm", family = binomial, aes(weight = trials))

Below is the complete (but messy) code for the plot() I would like to 
more or less replicate using
ggplot()

data("SpaceShuttle", package="vcd")
logit2p <- function(logit) 1/(1 + exp(-logit))

plot(nFailures/6 ~ Temperature, data = SpaceShuttle,
      xlim = c(30, 81), ylim = c(0,1),
      main = "NASA Space Shuttle O-Ring Failures",
      ylab = "Estimated failure probability",
      xlab = "Temperature (degrees F)",
      type="n")  # painters model: add points last

fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature,
           data = SpaceShuttle,
           family = binomial)
pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)

predicted <- data.frame(
     Temperature = 30 : 81,
     prob = logit2p(pred$fit),
     lower = logit2p(pred$fit - 2*pred$se),
     upper = logit2p(pred$fit + 2*pred$se)
     )

with(predicted, {
     polygon(c(Temperature, rev(Temperature)),
             c(lower, rev(upper)), col="lightpink", border=NA)
     lines(Temperature, prob, lwd=3)
     }
     )
with(SpaceShuttle,
     points(Temperature, nFailures/6, col="blue", pch=19, cex=1.3)
     )

I also tried following the example in given in ?geom_smooth, to supply 
the predicted
values over my range of x values, and lower/upper limits in a separate 
data frame:

pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)
predicted <- data.frame(
     Temperature = 30 : 81,
     prob = logit2p(pred$fit),
     lower = logit2p(pred$fit - 2*pred$se),
     upper = logit2p(pred$fit + 2*pred$se)
     )
ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
   geom_smooth(aes(ymin = lower, ymax = upper), data=predicted, 
stat="identity")

but this gives an error:

 > ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) +
+   geom_smooth(aes(ymin = lower, ymax = upper), data=predicted, 
stat="identity")
Error in eval(expr, envir, enclos) : object 'nFailures' not found
 >

-Michael


On 12/17/2013 11:26 AM, ONKELINX, Thierry wrote:
> Dear Michael,
>
> Calculate the propotions. Then it is easy to use the weight option of glm
>
> data("SpaceShuttle", package="vcd")
> SpaceShuttle$trials <- 6
>
> fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature, data = SpaceShuttle, family = binomial)
> fm2 <- glm(nFailures/trials ~ Temperature, data = SpaceShuttle, family = binomial, weight = trials)
> all.equal(coef(fm), coef(fm2))
>
> ggplot(SpaceShuttle, aes(x = Temperature, y = nFailures / trials)) + geom_point() + geom_smooth(method = "glm", family = binomial, aes(weight = trials))
>
> Best regards,
>
> Thierry
>
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Michael Friendly
> Verzonden: dinsdag 17 december 2013 14:58
> Aan: R-help
> Onderwerp: [R] ggplot2: stat_smooth for family=binomial with cbind(Y, N) formula
>
> With ggplot2, I can plot the glm stat_smooth for binomial data when the response is binary or a two-level factor as follows:
>
> data("Donner", package="vcdExtra")
> ggplot(Donner, aes(age, survived)) +
> geom_point(position = position_jitter(height = 0.02, width = 0)) + stat_smooth(method = "glm", family = binomial, formula = y ~ x, alpha = 0.2, size=2)
>
> But how can I specify the formula for stat_smooth when the response is cbind(successes, failures)?
> The equivalent with plot (minus the confidence band) for the example I want is:
>
> data("SpaceShuttle", package="vcd")
>
>   > head(SpaceShuttle, 5)
>     FlightNumber Temperature Pressure Fail nFailures Damage
> 1            1          66       50   no         0      0
> 2            2          70       50  yes         1      4
> 3            3          69       50   no         0      0
> 4            4          80       50 <NA>        NA     NA
> 5            5          68       50   no         0      0
>   >
>
> plot(nFailures/6 ~ Temperature, data = SpaceShuttle,
>        xlim = c(30, 81), ylim = c(0,1),
>        main = "NASA Space Shuttle O-Ring Failures",
>        ylab = "Estimated failure probability",
>        xlab = "Temperature (degrees F)",
>        pch = 19, col = "blue", cex=1.2)
> fm <- glm(cbind(nFailures, 6 - nFailures) ~ Temperature,
>             data = SpaceShuttle,
>             family = binomial)
> pred <- predict(fm, data.frame(Temperature = 30 : 81), se=TRUE)
> lines(30 : 81,
>         predict(fm, data.frame(Temperature = 30 : 81), type = "response"),
>         lwd = 3)
>
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
>
> ______________________________________________
>

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list