[Rd] glm start/offset bugs (PR#1422)

presnell@stat.ufl.edu presnell@stat.ufl.edu
Fri, 29 Mar 2002 04:04:03 +0100 (MET)


--fupGvOGOQM
Content-Type: text/plain; charset=us-ascii
Content-Description: message body and .signature
Content-Transfer-Encoding: 7bit


There's a simple bug in the handling of the start and offset arguments
in glm and glm.fit.  The bug exists in the latest development version
of R (version information below), but it appears that glm.R has not
been touched much lately, so the bug affects at least the most recent
stable release of R.

Here is a simple example:

> data(ships, package=MASS)
> ships.glm <- glm(incidents ~ type + year + period + offset(log(service)),
+ 		  family=poisson, data=ships, subset=service != 0)
> glm(incidents ~ type + year + period + offset(log(service)),
+     family=poisson, data=ships, subset=service != 0, start=coef(ships.glm))

or more simply

> update(ships.glm, start=coef(ships.glm))

The problem is caused by a bad initialization of etastart in glm.fit()
when an offset is present.  Fixing this and retrying the example above
then reveals another bug, this time in glm(), that is tickled only
when non-NULL values are given to both the offset and start arguments.

I've attached to the bottom of this message a simple patch to
src/library/base/R/glm.R that I believe correctly repairs these bugs.

Essentially the same change is needed in Berwin Turlach's Wed, 27 Feb
2002 version of glm.fit() reported on the R bug tracking site as
"Models/1331".  (The change to glm() is still required as well of
course.)  As an aside, I'm wondering if there has been any thought
given to adopting Berwin's patches.  Though I must admit that I
haven't taken the time to check carefully through his changes, I know
Berwin to be a very careful and skillful programmer, and I suspect
that he is the only person to have looked very closely at those
particular details of glm.fit in recent months (years?).  Also, some
of the simpler changes that he made seem to be needed just to cleanup
the code.  Would it be useful if a number of us began using his
version as an informal test?  After discovering this bug earlier
today, I have already begun to do so and so far have not encountered
any problems.

-- 
Brett Presnell
Department of Statistics
University of Florida
http://www.stat.ufl.edu/~presnell/


Version:
 platform = i686-pc-linux-gnu
 arch = i686
 os = linux-gnu
 system = i686, linux-gnu
 status = Under development (unstable)
 major = 1
 minor = 5.0
 year = 2002
 month = 03
 day = 27
 language = R

Search Path:
 .GlobalEnv, package:ctest, Autoloads, package:base



--fupGvOGOQM
Content-Type: text/plain
Content-Description: glm patch
Content-Disposition: inline;
	filename="glm.patch"
Content-Transfer-Encoding: 7bit

*** glm.R	Wed Feb 27 01:11:03 2002
--- glm-patched.R	Thu Mar 28 21:09:24 2002
***************
*** 67,71 ****
  	    if(is.empty.model(mt)) fit$deviance
  	    else glm.fit(x=X[,"(Intercept)",drop=FALSE], y=Y, weights=weights,
! 			 start=start, offset=offset, family=family,
  			 control=control, intercept=TRUE)$deviance
      }
--- 67,71 ----
  	    if(is.empty.model(mt)) fit$deviance
  	    else glm.fit(x=X[,"(Intercept)",drop=FALSE], y=Y, weights=weights,
! 			 offset=offset, family=family,
  			 control=control, intercept=TRUE)$deviance
      }
***************
*** 152,156 ****
  			   "and correspond to initial coefs for",
  			   deparse(xnames)))
! 	    else as.vector(if (NCOL(x) == 1) x * start else x %*% start)
  	else family$linkfun(mustart)
      mu <- linkinv(eta)
--- 152,157 ----
  			   "and correspond to initial coefs for",
  			   deparse(xnames)))
! 	    else offset +
!               as.vector(if (NCOL(x) == 1) x * start else x %*% start)
  	else family$linkfun(mustart)
      mu <- linkinv(eta)

--fupGvOGOQM--

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._