# [R] inverse prediction and Poisson regression

Fri Jul 25 22:33:12 CEST 2003

```Thanks Peter, for the wonderful illustration of various model-fitting
options for the non-linear models.

does better than the weighted non-linear least-squars - I am
interpreting this to mean that the residual sum of squares is smaller,
0.16 versus 0.64.  A possible explanation may be that in the former
case the responses are actually smaller because of the square-rooting
(range about 3-5), so the residuals are smaller, whereas in the "gnls"
case the response is the original variable (range about 10-30). Does
this seem reasonable?

Ravi.

----- Original Message -----
From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
Date: Friday, July 25, 2003 3:48 pm
Subject: Re: [R] inverse prediction and Poisson regression

> Spencer Graves <spencer.graves at pdf.com> writes:
>
> > The problem with nls is that it is NOT maximum likelihood for the
> > Poisson distribution.  For the Poisson, the standard deviation
> is the
> > square root of the mean, while nls assumes constant standard
> > deviation. That's why I stayed with glm.  The answers should be
> > comparable, but I would prefer the glm answers.  spencer graves
>
> That's why I was suggesting either a variance stabilizing
> transformation or using gnls() with a weights function. In both cases
> you need to watch out for scaling of SE's stemming from the fact that
> the variance is supposed to be known in the Poisson distribution, but
> the fitting routines estimate a sigma from the residuals anyway.
>
> I.e. for instance:
>
> > Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 +
> > x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T)
> 9.400602 :  20.00  0.01
> 0.9064723 :  27.21511020  0.03593643
> 0.06338235 :  28.21654101  0.05995647
> 0.02616589 :  28.50221759  0.06815302
> 0.02608587 :  28.54871243  0.06835046
> 0.02608586 :  28.54960242  0.06834083
> 0.02608586 :  28.5495605  0.0683413
> > summary(Phyto.nls2)
>
> Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5)
>
> Parameters:
>     Estimate Std. Error t value Pr(>|t|)
> Ymax 28.54956    1.65113  17.291   0.0368 *
> x50   0.06834    0.01264   5.407   0.1164
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.1615 on 1 degrees of freedom
>
> Correlation of Parameter Estimates:
>       Ymax
> x50 -0.6833
>
> but here you need to know that the theoretical sd is 0.5, so the Std.
> errors all need multiplication by 0.5/0.1615.
>
> Using gnls() we'd get (if I got the calling sequence right...)
>
> > Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50),
> +
> data=Phytopath,weights=varPower(fixed=.5),start=list
(Ymax=20.0,x50=0.01))> summary(Phyto.gnls)
> Generalized nonlinear least squares fit
>  Model: y ~ Ymax/(1 + x/x50)
>  Data: Phytopath
>       AIC      BIC    logLik
>  13.71292 11.00875 -3.856458
>
> Variance function:
> Structure: Power of variance covariate
> Formula: ~fitted(.)
> Parameter estimates:
> power
>  0.5
>
> Coefficients:
>         Value Std.Error   t-value p-value
> Ymax 29.393796 2.4828723 11.838626  0.0536
> x50   0.063028 0.0125244  5.032376  0.1249
>
> Correlation:
>    Ymax
> x50 -0.84
>
> Standardized residuals:
>  -0.4894266  0.7621749 -0.4237346
> attr(,"std")
>  2.8478142 1.4239071 0.8586483
> attr(,"label")
>  "Standardized residuals"
>
> Residual standard error: 0.6367906
> Degrees of freedom: 3 total; 1 residual
>
> and again, it is necessary to adjust the SE's by multiplying with
> 1/0.6368
>
> It is in fact rather easy to do direct MLE for this kind of model:
>
> > with(Phytopath,
> > optim(c(28,.7),fn=function(p){Ymax<-p;x50<-p;
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))
> \$par
>  28.55963487  0.06833524
>
> \$value
>  7.21251
>
> \$counts
>      42       13
>
> \$convergence
>  0
>
> \$message
> NULL
>
> \$hessian
>           [,1]        [,2]
> [1,] 0.07356072    6.631868
> [2,] 6.63186764 1294.792539
>
> Warning message:
> NaNs produced in: dpois(x, lambda, log)
>
> and we can get SE's from the inverse hessian with
>
> > sqrt(diag(solve(with(Phytopath,
> + optim(c(28,.7),fn=function(p){Ymax<-p;x50<-p;
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))\$hessian)))
>  5.02565893 0.03788052
>
> Notice that the variance stabilizing transform seems to do rather
> better than gnls() compared to true MLE. I'm slightly puzzled by this,
> but presumably it has to do with the fact that MLE for a Gaussian
> model with a mean/variance relationship is *not* identical to the
> iteratively reweighted least squares that glm() et al. are doing. (I
> wouldn't quite rule out that I've simply made a mistake somewhere,
> though...)
>
> --
>   O__  ---- Peter Dalgaard             Blegdamsvej 3
>  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
> (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45)
> 35327918~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX:
> (+45) 35327907
>

```