Possible bug with glm.nb and starting values (PR#1695)

bcooper@hsph.harvard.edu bcooper@hsph.harvard.edu
Thu, 20 Jun 2002 18:59:36 +0200 (MET DST)


Full_Name: Ben Cooper
Version: 1.5.0
OS: linux
Submission from: (NULL) (134.174.187.90)


The help page for glm.nb (in MASS package) says that it takes "Any other
arguments for the glm() function except family"

One such argument is   start   "starting values for the parameters in the linear
predictor."

However, when called with starting values glm.nb returns:

   Error in model.frame(formula, rownames, variables, varnames, extras,
extranames,  : 
	  variable lengths differ

So it looks like this is either a bug, the documentation is inaccurate, or I'm
missing something (or some combination of the above).


An example:
________________________________________________________________


library(MASS)

y<-c(7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3,3,4)

lag1<-c(0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3,3)

lag2<-c(0,0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5,3)

lag3<-c(0,0,0,7,5,4,7,5,2,11,5,5,4,2,3,4,3,5,9,6,7,10,6,
12,6,3,5,3,9,13,0,6,1,2,0,1,0,0,4,5,1,5)


># first a poisson model which is OK 
>glm(y~lag1+lag2+lag3,family=poisson(link=identity))
Error: no valid set of coefficients has been found:please supply starting
values
In addition: Warning message: 
NaNs produced in: log(x) 

> #therefore try:
> glm(y~lag1+lag2+lag3,family=poisson(link=identity),start=c(2,0.1,0.1,0.1))

# and this works. However, negative binomial model is not OK:

> glm.nb(y~lag1+lag2+lag3,link=identity)
Error: no valid set of coefficients has been found:please supply starting
values
In addition: Warning message: 
NaNs produced in: log(x) 

> #so try
> glm.nb(y~lag1+lag2+lag3,link=identity,start=c(2,0.1,0.1,0.1))
Error in model.frame(formula, rownames, variables, varnames, extras, extranames,
 : 
	variable lengths differ

________________________________________________________________



I can get glm.nb to work with starting values with the following hack to the
glm.nb code:

i)

change the line
 m$method <- m$model <- m$x <- m$y <- m$control <- m$contrasts <- m$init.theta
<- m$link <- m$... <- NULL

 to 

 m$method <- m$model <- m$x <- m$y <- m$control <- m$contrasts <- m$init.theta
<- m$link <-m$start <- m$... <- NULL


[i.e. insert m$start]

ii) 

remove the line:

start <- model.extract(m, start)


iii)

change the line

fit <- glm.fitter(x = X, y = Y, w = w, etastart = start, 
        offset = offset, family = fam0, control = list(maxit = control$maxit, 
            epsilon = control$epsilon, trace = control$trace > 
                1))

to
   fit <- glm.fitter(x = X, y = Y, w = w, start=start,
        offset = offset, family = fam0, control = list(maxit = control$maxit, 
            epsilon = control$epsilon, trace = control$trace > 
                1))

iv) change default value of start to NULL (as in glm) rather than eta when
glm.nb is called.

newglm.nb with these changes then works.

But no doubt there  are reasons for things being the way they are and these
changes screw other things up.

$platform
[1] "i686-pc-linux-gnu"
$arch
[1] "i686"
$os
[1] "linux-gnu"
$system
[1] "i686, linux-gnu"
$status
[1] ""
$major
[1] "1"
$minor
[1] "5.0"
$year
[1] "2002"
$month
[1] "04"
$day
[1] "29"
$language
[1] "R"

I also tried this with version 1.4.1 under Windows and had the same probelm.

cheers

Ben




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._