[Rd] predict.glm returns different results for the same model

Martin Maechler maechler at stat.math.ethz.ch
Fri Apr 27 18:51:26 CEST 2018


>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>>     on Fri, 27 Apr 2018 10:28:16 -0400 writes:

    > On 27/04/2018 9:25 AM, Hadley Wickham wrote:
    >> Hi all,
    >> 
    >> Very surprising (to me!) and mystifying result from predict.glm(): the
    >> predictions vary depending on whether or not I use ns() or
    >> splines::ns(). Reprex follows: >
    >> library(splines)
    >> 
    >> set.seed(12345)
    >> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
    >> mns <- c(3.4, 3.6)
    >> sds <- c(0.24, 0.35)
    >> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
    >> sds[dat$claim + 1]))
    >> dat <- dat[order(dat$wind), ]
    >> 
    >> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
    >> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)
    >> 
    >> # The model coefficients are the same
    >> unname(coef(m1))
    >> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
    >> unname(coef(m2))
    >> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
    >> 
    >> # But the predictions are not!
    >> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
    >> unname(predict(m1, newdata = newdf))
    >> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
    >> unname(predict(m2, newdata = newdf))
    >> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
    >> 
    >> Is this a bug?

    > The two objects m1 and m2 differ more than they should, so this looks 
    > like a problem in glm, not just in predict.glm.

    >> attr(m1$terms, "predvars")
    > list(claim, ns(wind, knots = c(25.4756277492997, 30.2270250736796,
    > 35.4093171222502, 43.038645381669), Boundary.knots = c(12.9423820390783,
    > 108.071583734075), intercept = FALSE))

    >> attr(m2$terms, "predvars")
    > list(claim, splines::ns(wind, df = 5))

    > This appears to be due to a bug in the splines package.  There, the 
    > function splines:::makepredictcall.ns looks like this:

    > makepredictcall.ns <- function(var, call)
    > {
    > if(as.character(call)[1L] != "ns") return(call)
    > at <- attributes(var)[c("knots", "Boundary.knots", "intercept")]
    > xxx <- call[1L:2L]
    > xxx[names(at)] <- at
    > xxx
    > }

    > The test fails for m2, because as.character(call)[1L] is "splines::ns" 
    > instead of "ns". I'll see if I can work out a better test and submit a 
    > patch.

    > Duncan Murdoch

Thank you, Duncan, for the good diagnosis and the patch!

I will deal with it.

Two things however (both *not* needing a new patch; I'll deal
    	   with it after dinner ;-)

1) I'd like to commit that ASAP, but do want to add a *minimal*
   Reprex for regression test, rather than the above, notably as I
   see that the only place our code calls  makepredictcall() is in
   model.frame.default(), so I thought this should not really be
   related to glm at all, and I was right.

2) Reading the help page ?makepredictcall
    --- yes, a good idea, even for experts, believe me, ((notably when it is from	 a base package, not produced from roxy.. ;-\)) ---
   reveals what I vaguely remembered:
   The function was introduced to fix a bug first encountered in
      lm(. ~  poly(.))

  and indeed:

  -> That help page actually contains an indirect closer to minimal reprex.
  -> I will also patch the  makepredictcall.poly() method.

  [ as I said: after dinner .. ;-)]

--
Martin



More information about the R-devel mailing list