[R] Optimization problem

Wed Aug 22 13:44:39 CEST 2007

Try this.

1. following Ben remove the Randalstown point and reset the levels of the
Location factor.

2. then replace solve with ginv so it uses the generalized inverse to calculate
the hessian:

alan2 <- subset(alan, subset = Location != "Randalstown")
alan2$Location <- factor(as.character(alan2$Location))

library(MASS)
solve <- ginv

zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass
+ Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data = alan2)

rm(solve)

On 8/21/07, Ben Bolker <bolker at zoo.ufl.edu> wrote:
>
>  (Hope this gets threaded properly.  Sorry if it doesn't.)
>
>   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
> produce NAs (but would produce something like a singular Hessian
> and maybe other problems) -- but they're not even specified in this
> model.
>
>  The bottom line is that you have a location with a single
> observation, so the GLM that zicounts runs to get the initial
> parameter values has an unestimable location:mass interaction
> for one location, so it gives an NA, so optim complains.
>
>  In gruesome detail:
>
> ## set up  data
> scardat = read.table("scars.dat",header=TRUE)
> library(zicounts)
> ## try to run model
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scardat)
> ## tried to debug this by dumping zicounts.R to a file, modifying
> ## it to put a "trace" argument in that would print out the parameters
> ## and log-likelihood for every call to the log-likelihood function.
> dump("zicounts",file="zicounts.R")
> source("zicounts.R")
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scardat,trace=TRUE)
> ## this actually didn't do any good because the negative log-likelihood
> ## function never gets called -- as it turns out optim() barfs when it
> ## gets its initial values, before it ever gets to evaluating the
> log-likelihood
>
> ## check the glm -- this is the equivalent of what zicounts does to
> ## get the initial values of the x parameters
> p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
>          data=scardat,family="poisson")
> which(is.na(coef(p1)))
>
> ## find out what the deal is
> table(scardat$Location)
>
> scar2 = subset(scardat,Location!="Randalstown")
> ## first step to removing the bad point from the data set -- but ...
> table(scar2$Location)
> ## it leaves the Location factor with the same levels, so
> ##  now we have ZERO counts for one location:
> ## redefine the factor to drop unused levels
> scar2$Location <- factor(scar2$Location)
> ## OK, looks fine now
> table(scar2$Location)
>
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scar2)
> ## now we get another error ("system is computationally singular" when
> ## trying to compute Hessian -- overparameterized?)   Not in any
> ## trivial way that I can see.  It would be nice to get into the guts
> ## of zicounts and stop it from trying to invert the Hessian, which is
> ## I think where this happens.
>
>  In the meanwhile, I have some other  ideas about this analysis (sorry,
> but you started it ...)
>
>  Looking at the data in a few different ways:
>
> library(lattice)
> xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
>       auto.key=list(columns=3))
> xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)
>
> xyplot(Scars~Lar,groups=Location,data=scar2,
>       auto.key=list(columns=3))
> xyplot(Scars~Mass|Lar,data=scar2)
> xyplot(Scars~Lar|Location,data=scar2)
>
>   Some thoughts: (1) I'm not at all sure that
> zero-inflation is necessary (see Warton 2005, Environmentrics).
> This is a fairly small, noisy data set without huge numbers
> of zeros -- a plain old negative binomial might be fine.
>
>   I don't actually see a lot of signal here, period (although there may
> be some) ...
> there's not a huge range in Lar (whatever it is -- the rest of the
> covariates I
> think I can interpret).  It would be tempting to try to fit location as
> a random
> effect, because fitting all those extra degrees of freedom is going to
> kill you.
> On the other hand, GLMMs are a bit hairy.
>
>   cheers
>       Ben
>
>