[Rd] predict.glm returns different results for the same model

Duncan Murdoch murdoch.duncan at gmail.com
Fri Apr 27 16:28:16 CEST 2018


On 27/04/2018 9:25 AM, Hadley Wickham wrote:
> Hi all,
> 
> Very surprising (to me!) and mystifying result from predict.glm(): the
> predictions vary depending on whether or not I use ns() or
> splines::ns(). Reprex follows: >
> library(splines)
> 
> set.seed(12345)
> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
> mns <- c(3.4, 3.6)
> sds <- c(0.24, 0.35)
> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
> sds[dat$claim + 1]))
> dat <- dat[order(dat$wind), ]
> 
> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)
> 
> # The model coefficients are the same
> unname(coef(m1))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> unname(coef(m2))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> 
> # But the predictions are not!
> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
> unname(predict(m1, newdata = newdf))
> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
> unname(predict(m2, newdata = newdf))
> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
> 
> Is this a bug?

The two objects m1 and m2 differ more than they should, so this looks 
like a problem in glm, not just in predict.glm.

 > attr(m1$terms, "predvars")
list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783,
108.071583734075), intercept = FALSE))

 > attr(m2$terms, "predvars")
list(claim, splines::ns(wind, df = 5))

This appears to be due to a bug in the splines package.  There, the 
function splines:::makepredictcall.ns looks like this:

makepredictcall.ns <- function(var, call)
{
     if(as.character(call)[1L] != "ns") return(call)
     at <- attributes(var)[c("knots", "Boundary.knots", "intercept")]
     xxx <- call[1L:2L]
     xxx[names(at)] <- at
     xxx
}

The test fails for m2, because as.character(call)[1L] is "splines::ns" 
instead of "ns". I'll see if I can work out a better test and submit a 
patch.

Duncan Murdoch

> 
> (Motivating issue from this ggplot2 bug report:
> https://github.com/tidyverse/ggplot2/issues/2426)
> 
> Thanks!
> 
> Hadley



More information about the R-devel mailing list