[Rd] glm start/offset bugs (PR#1422)

Prof Brian D Ripley ripley@stats.ox.ac.uk
Fri, 29 Mar 2002 06:25:11 +0000 (GMT)

On Fri, 29 Mar 2002 presnell@stat.ufl.edu wrote:

> --fupGvOGOQM
> Content-Type: text/plain; charset=us-ascii
> Content-Description: message body and .signature
> Content-Transfer-Encoding: 7bit
> There's a simple bug in the handling of the start and offset arguments
> in glm and glm.fit.  The bug exists in the latest development version
> of R (version information below), but it appears that glm.R has not
> been touched much lately, so the bug affects at least the most recent
> stable release of R.
> Here is a simple example:
> > data(ships, package=MASS)
> > ships.glm <- glm(incidents ~ type + year + period + offset(log(service)),
> + 		  family=poisson, data=ships, subset=service != 0)
> > glm(incidents ~ type + year + period + offset(log(service)),
> +     family=poisson, data=ships, subset=service != 0, start=coef(ships.glm))
> or more simply
> > update(ships.glm, start=coef(ships.glm))
> The problem is caused by a bad initialization of etastart in glm.fit()
> when an offset is present.  Fixing this and retrying the example above
> then reveals another bug, this time in glm(), that is tickled only
> when non-NULL values are given to both the offset and start arguments.
> I've attached to the bottom of this message a simple patch to
> src/library/base/R/glm.R that I believe correctly repairs these bugs.
> Essentially the same change is needed in Berwin Turlach's Wed, 27 Feb
> 2002 version of glm.fit() reported on the R bug tracking site as
> "Models/1331".  (The change to glm() is still required as well of
> course.)  As an aside, I'm wondering if there has been any thought
> given to adopting Berwin's patches.  Though I must admit that I
> haven't taken the time to check carefully through his changes, I know
> Berwin to be a very careful and skillful programmer, and I suspect
> that he is the only person to have looked very closely at those
> particular details of glm.fit in recent months (years?).  Also, some
> of the simpler changes that he made seem to be needed just to cleanup
> the code.  Would it be useful if a number of us began using his
> version as an informal test?  After discovering this bug earlier
> today, I have already begun to do so and so far have not encountered
> any problems.

Yes, thought has been given, and the code has been looked at. (People
do look at R code you know, even if they don't jump in to change it
after a few hour's use.)

Given the time pressures, this is not going to get looked at for 1.5.0. I
have long thought that given the history of problems maintaining glm()
that it might be easier to rewrite it for scratch.  Having .null versions
serves no useful purpose, for example.

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch