[R] inverse prediction and Poisson regression
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Fri Jul 25 21:43:46 CEST 2003
Spencer Graves <spencer.graves at pdf.com> writes:
> The problem with nls is that it is NOT maximum likelihood for the
> Poisson distribution. For the Poisson, the standard deviation is the
> square root of the mean, while nls assumes constant standard
> deviation. That's why I stayed with glm. The answers should be
> comparable, but I would prefer the glm answers. spencer graves
That's why I was suggesting either a variance stabilizing
transformation or using gnls() with a weights function. In both cases
you need to watch out for scaling of SE's stemming from the fact that
the variance is supposed to be known in the Poisson distribution, but
the fitting routines estimate a sigma from the residuals anyway.
I.e. for instance:
> Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 +
> x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T)
9.400602 : 20.00 0.01
0.9064723 : 27.21511020 0.03593643
0.06338235 : 28.21654101 0.05995647
0.02616589 : 28.50221759 0.06815302
0.02608587 : 28.54871243 0.06835046
0.02608586 : 28.54960242 0.06834083
0.02608586 : 28.5495605 0.0683413
> summary(Phyto.nls2)
Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Ymax 28.54956 1.65113 17.291 0.0368 *
x50 0.06834 0.01264 5.407 0.1164
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.1615 on 1 degrees of freedom
Correlation of Parameter Estimates:
Ymax
x50 -0.6833
but here you need to know that the theoretical sd is 0.5, so the Std.
errors all need multiplication by 0.5/0.1615.
Using gnls() we'd get (if I got the calling sequence right...)
> Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50),
+ data=Phytopath,weights=varPower(fixed=.5),start=list(Ymax=20.0,x50=0.01))
> summary(Phyto.gnls)
Generalized nonlinear least squares fit
Model: y ~ Ymax/(1 + x/x50)
Data: Phytopath
AIC BIC logLik
13.71292 11.00875 -3.856458
Variance function:
Structure: Power of variance covariate
Formula: ~fitted(.)
Parameter estimates:
power
0.5
Coefficients:
Value Std.Error t-value p-value
Ymax 29.393796 2.4828723 11.838626 0.0536
x50 0.063028 0.0125244 5.032376 0.1249
Correlation:
Ymax
x50 -0.84
Standardized residuals:
[1] -0.4894266 0.7621749 -0.4237346
attr(,"std")
[1] 2.8478142 1.4239071 0.8586483
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.6367906
Degrees of freedom: 3 total; 1 residual
and again, it is necessary to adjust the SE's by multiplying with
1/0.6368
It is in fact rather easy to do direct MLE for this kind of model:
> with(Phytopath,
> optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
+ -sum(dpois(y,lambda=Ymax/(1
+ + x/x50),log=TRUE))}, method="BFGS", hessian=T))
$par
[1] 28.55963487 0.06833524
$value
[1] 7.21251
$counts
function gradient
42 13
$convergence
[1] 0
$message
NULL
$hessian
[,1] [,2]
[1,] 0.07356072 6.631868
[2,] 6.63186764 1294.792539
Warning message:
NaNs produced in: dpois(x, lambda, log)
and we can get SE's from the inverse hessian with
> sqrt(diag(solve(with(Phytopath,
+ optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
+ -sum(dpois(y,lambda=Ymax/(1
+ + x/x50),log=TRUE))}, method="BFGS", hessian=T))$hessian)))
[1] 5.02565893 0.03788052
Notice that the variance stabilizing transform seems to do rather
better than gnls() compared to true MLE. I'm slightly puzzled by this,
but presumably it has to do with the fact that MLE for a Gaussian
model with a mean/variance relationship is *not* identical to the
iteratively reweighted least squares that glm() et al. are doing. (I
wouldn't quite rule out that I've simply made a mistake somewhere,
though...)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list