[R] glm cannot find valid starting values
Dan Bebber
danbebber at forestecology.co.uk
Fri Jul 21 13:35:58 CEST 2006
glm(S ~ -1 + Mdif, family=quasipoisson(link=identity), start=strt, sdat)
gives error:
> Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart
> =
> etastart, :
> cannot find valid starting values: please specify some
strt is set to be the coefficient for a similar fit
glm(S ~ -1 + I(Mdif + 1),...
i.e. (Mdif + 1) is a vector similar to Mdif.
The error appears to occur when some values of Mdif are negative,
though I have not had this problem with simulated datasets.
Any solutions greatly appreciated (full details and data below).
Dan Bebber
Department of Plant Sciences
University of Oxford
OS: WinXP Pro SP2 and Win ME (tried both)
Processor: Dual Intel Xeon and Pentium 4 (tried both)
R version: 2.3.0 and 2.3.1 (tried both)
#Full details (can be pasted directly into R):
#Data:
sdat <- data.frame(
S = c(0, 0, 0, 0, 28, 0, 1, 7, 0, 0, 39, 2, 0, 0, 40, 0, 0, 0, 0,
0, 0, 15, 0, 0, 3, 0, 0, 0, 2, 0, 3, 0, 30, 0, 20, 0, 1, 0, 0,
1, 21, 0, 0, 4, 14, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 3, 0,
2, 5, 0, 0, 0, 0, 0, 0, 0, 25, 0, 5, 0, 0, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 10, 0, 0, 0, 0, 0, 9, 1, 1, 1, 1, 0, 0, 3,
0, 27, 7, 0, 0, 0, 0, 0, 1, 1, 4, 2, 1, 2, 4, 1, 4, 6, 12, 4,
6, 3, 4, 0, 4, 0, 6, 1, 3, 1, 4, 4, 1, 1, 2, 1, 6, 1, 0, 0, 1,
0, 1, 0, 0, 6, 0, 0, 0, 0, 2, 2, 3, 1, 6, 2, 2, 1, 1, 4, 4, 3,
3, 7, 1, 3, 5, 6, 4, 0, 1, 4, 2, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
1, 0, 4, 0, 0, 1, 0, 2, 0, 0, 1, 2, 0, 4, 0, 2, 0, 3, 0, 2, 2,
0, 4, 0, 2, 0, 1, 2, 3, 0, 0, 0, 0, 3, 1, 0, 0, 0, 3, 2, 1, 2,
1, 1, 2, 0, 0, 3, 3, 1, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 3, 2, 0,
0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 4, 2, 4, 0, 2, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 3, 2, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 4, 1, 0, 0, 1),
M = 620+c(0,cumsum(sdat$S[-328])))
#S is the (unknown) number of N individuals that irreversibly change state
in a time
#interval t. The data provided are a subset of the full data set.
#M is the cumulative sum of individuals that have changed state up to t-1.
#Assume that the rate of state change is constant (S ~ kM), but the
#distribution of S is clustered.
#The goal is to estimate N.
#N can be estimated by fitting:
qpglm <- glm(S ~ M, family = quasipoisson(link = identity), sdat)
summary(qpglm)
N.est <- -coef(qpglm)[1]/coef(qpglm)[2]
N.est
#i.e. x-intercept is minus intercept/slope
#To estimate confidence limits on N.est, fit models without intercept to
#N.est - M + x, where x is an integer. The model will have the lowest
deviance
#when x = 0.
x <- 0
Mdif <- N.est - M + x
qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity), sdat)
summary(qpglm2)
#Use analysis of deviance to estimate confidence limits on N.est:
disp <- summary(qpglm)$dispersion
dfres <- df.residual(qpglm)
dev.res <- deviance(qpglm)
#From MASS4, p. 210, assume that changes in deviance scaled by
#dispersion as |x| increases have an F distribution
dev.crit <- dev.res+qf(0.95,1,dfres)*disp
dev.crit
#values of x for which the deviance = dev.crit give approximate 95%
confidence limits
#on N.est.
#The error occurs when x <= -91.7:
x <- -91.7
sdat$Mdif <- N.est - sdat$M + x
strt <- coef(glm(S ~ -1 + I(Mdif+1), family = quasipoisson(link = identity),
sdat))
qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity),
start=strt, sdat)
#The problem is that this interferes with optimization to find values of x
for which
#deviance = dev.crit
More information about the R-help
mailing list