[R] inverse prediction and Poisson regression

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Fri Jul 25 21:43:46 CEST 2003


Spencer Graves <spencer.graves at pdf.com> writes:

> The problem with nls is that it is NOT maximum likelihood for the
> Poisson distribution.  For the Poisson, the standard deviation is the
> square root of the mean, while nls assumes constant standard
> deviation. That's why I stayed with glm.  The answers should be
> comparable, but I would prefer the glm answers.  spencer graves

That's why I was suggesting either a variance stabilizing
transformation or using gnls() with a weights function. In both cases
you need to watch out for scaling of SE's stemming from the fact that
the variance is supposed to be known in the Poisson distribution, but
the fitting routines estimate a sigma from the residuals anyway. 

I.e. for instance:

> Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 +
> x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T)
9.400602 :  20.00  0.01
0.9064723 :  27.21511020  0.03593643
0.06338235 :  28.21654101  0.05995647
0.02616589 :  28.50221759  0.06815302
0.02608587 :  28.54871243  0.06835046
0.02608586 :  28.54960242  0.06834083
0.02608586 :  28.5495605  0.0683413
> summary(Phyto.nls2)

Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5)

Parameters:
     Estimate Std. Error t value Pr(>|t|)
Ymax 28.54956    1.65113  17.291   0.0368 *
x50   0.06834    0.01264   5.407   0.1164
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 0.1615 on 1 degrees of freedom

Correlation of Parameter Estimates:
       Ymax
x50 -0.6833

but here you need to know that the theoretical sd is 0.5, so the Std.
errors all need multiplication by 0.5/0.1615.

Using gnls() we'd get (if I got the calling sequence right...)

> Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50),
+  data=Phytopath,weights=varPower(fixed=.5),start=list(Ymax=20.0,x50=0.01))
> summary(Phyto.gnls)
Generalized nonlinear least squares fit
  Model: y ~ Ymax/(1 + x/x50)
  Data: Phytopath
       AIC      BIC    logLik
  13.71292 11.00875 -3.856458

Variance function:
 Structure: Power of variance covariate
 Formula: ~fitted(.)
 Parameter estimates:
power
  0.5

Coefficients:
         Value Std.Error   t-value p-value
Ymax 29.393796 2.4828723 11.838626  0.0536
x50   0.063028 0.0125244  5.032376  0.1249

 Correlation:
    Ymax
x50 -0.84

Standardized residuals:
[1] -0.4894266  0.7621749 -0.4237346
attr(,"std")
[1] 2.8478142 1.4239071 0.8586483
attr(,"label")
[1] "Standardized residuals"

Residual standard error: 0.6367906
Degrees of freedom: 3 total; 1 residual

and again, it is necessary to adjust the SE's by multiplying with
1/0.6368

It is in fact rather easy to do direct MLE for this kind of model:

> with(Phytopath,
> optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
+ -sum(dpois(y,lambda=Ymax/(1
+ + x/x50),log=TRUE))}, method="BFGS", hessian=T))
$par
[1] 28.55963487  0.06833524

$value
[1] 7.21251

$counts
function gradient
      42       13

$convergence
[1] 0

$message
NULL

$hessian
           [,1]        [,2]
[1,] 0.07356072    6.631868
[2,] 6.63186764 1294.792539

Warning message:
NaNs produced in: dpois(x, lambda, log)

and we can get SE's from the inverse hessian with

> sqrt(diag(solve(with(Phytopath,
+ optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
+ -sum(dpois(y,lambda=Ymax/(1
+ + x/x50),log=TRUE))}, method="BFGS", hessian=T))$hessian)))
[1] 5.02565893 0.03788052

Notice that the variance stabilizing transform seems to do rather
better than gnls() compared to true MLE. I'm slightly puzzled by this,
but presumably it has to do with the fact that MLE for a Gaussian
model with a mean/variance relationship is *not* identical to the
iteratively reweighted least squares that glm() et al. are doing. (I
wouldn't quite rule out that I've simply made a mistake somewhere,
though...)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list