[R-sig-eco] offset in Logistic regression

Andrew Muir ammuir at purdue.edu
Thu Jul 17 17:27:35 CEST 2008


Dear list,

I am attempting to include an offset variable (Julian Day) in a series of 
logistic models using the glm() function in R.
I want to account for Juliann day of sampling in each of these models 
because sample collection was temporally stratified across the entire year.
The models work fine without the offset variable, but when I include it I 
get the following error messages.

> #Full model (Global hypothesis)
> M1= glm(Presence~Lat + Depth + Cliff + Temp_var + offset(J_Day), 
> family=binomial(link=logit))
Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = 
etastart,  :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = 
etastart,  :
  fitted probabilities numerically 0 or 1 occurred
3: In glm.fit(x = X[, "(Intercept)", drop = FALSE], y = Y, weights = 
weights,  :
  fitted probabilities numerically 0 or 1 occurred
> summary (M1)

Warning messages:
1: In method(x = x[, varseq <= i, drop = FALSE], y = object$y, weights = 
object$prior.weights,  :
  algorithm did not converge
2: In method(x = x[, varseq <= i, drop = FALSE], y = object$y, weights = 
object$prior.weights,  :
  fitted probabilities numerically 0 or 1 occurred
3: In method(x = x[, varseq <= i, drop = FALSE], y = object$y, weights = 
object$prior.weights,  :
  fitted probabilities numerically 0 or 1 occurred
4: In method(x = x[, varseq <= i, drop = FALSE], y = object$y, weights = 
object$prior.weights,  :
  fitted probabilities numerically 0 or 1 occurred
>


I spent considerable time on the internet trying to figure out what the 
problem is and I came across the following bug report from 2002:

[Rd] glm start/offset bugs (PR#1422)

presnell at stat.ufl.edu presnell at stat.ufl.edu
Fri, 29 Mar 2002 04:04:03 +0100 (MET)

  a.. Previous message: [Rd] pos argument to library
  b.. Next message: [Rd] glm start/offset bugs (PR#1422)
  c.. Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

--------------------------------------------------------------------------------

--fupGvOGOQM

Content-Type: text/plain; charset=us-ascii

Content-Description: message body and .signature

Content-Transfer-Encoding: 7bit





There's a simple bug in the handling of the start and offset arguments

in glm and glm.fit.  The bug exists in the latest development version

of R (version information below), but it appears that glm.R has not

been touched much lately, so the bug affects at least the most recent

stable release of R.



Here is a simple example:



> data(ships, package=MASS)

> ships.glm <- glm(incidents ~ type + year + period + offset(log(service)),

+                family=poisson, data=ships, subset=service != 0)

> glm(incidents ~ type + year + period + offset(log(service)),

+     family=poisson, data=ships, subset=service != 0, 
start=coef(ships.glm))



or more simply



> update(ships.glm, start=coef(ships.glm))



The problem is caused by a bad initialization of etastart in glm.fit()

when an offset is present.  Fixing this and retrying the example above

then reveals another bug, this time in glm(), that is tickled only

when non-NULL values are given to both the offset and start arguments.



I've attached to the bottom of this message a simple patch to

src/library/base/R/glm.R that I believe correctly repairs these bugs.



Essentially the same change is needed in Berwin Turlach's Wed, 27 Feb

2002 version of glm.fit() reported on the R bug tracking site as

"Models/1331".  (The change to glm() is still required as well of

course.)  As an aside, I'm wondering if there has been any thought

given to adopting Berwin's patches.  Though I must admit that I

haven't taken the time to check carefully through his changes, I know

Berwin to be a very careful and skillful programmer, and I suspect

that he is the only person to have looked very closely at those

particular details of glm.fit in recent months (years?).  Also, some

of the simpler changes that he made seem to be needed just to cleanup

the code.  Would it be useful if a number of us began using his

version as an informal test?  After discovering this bug earlier

today, I have already begun to do so and so far have not encountered

any problems.




I sent an e-mail to Dr. Presnell (the person that wrote the bug report), but 
to-date I have not received a response from him.
So, my questions to the list are:
1. does anyone know if this bug has been fixed in R 2.6.1? (Dr. Presnell 
included a patch for this problem in his bug report, but I don't know if I 
should try to install the patch)
2. perhaps my problem is not with the glm.fit function, but a problem with 
the structure of my data or with my use of the offset command, if so, can 
anyone provide suggestions?

Thanks in advance,

AMM.

Andrew M. Muir
Purdue University,
Department of Forestry and Natural Resources,
Pfendler Hall of Agriculture, Rm G004
715 W. State St.
West Lafayette, IN  47907
Voice: 765-494-9597
FAX: 765-496-2422
email: ammuir at purdue.edu



More information about the R-sig-ecology mailing list