[R] inverse prediction and Poisson regression
Ravi Varadhan
rvaradha at jhsph.edu
Fri Jul 25 22:33:12 CEST 2003
Thanks Peter, for the wonderful illustration of various model-fitting
options for the non-linear models.
Thinking about your comment that the "variance stabilizing" transform
does better than the weighted non-linear least-squars - I am
interpreting this to mean that the residual sum of squares is smaller,
0.16 versus 0.64. A possible explanation may be that in the former
case the responses are actually smaller because of the square-rooting
(range about 3-5), so the residuals are smaller, whereas in the "gnls"
case the response is the original variable (range about 10-30). Does
this seem reasonable?
Ravi.
----- Original Message -----
From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
Date: Friday, July 25, 2003 3:48 pm
Subject: Re: [R] inverse prediction and Poisson regression
> Spencer Graves <spencer.graves at pdf.com> writes:
>
> > The problem with nls is that it is NOT maximum likelihood for the
> > Poisson distribution. For the Poisson, the standard deviation
> is the
> > square root of the mean, while nls assumes constant standard
> > deviation. That's why I stayed with glm. The answers should be
> > comparable, but I would prefer the glm answers. spencer graves
>
> That's why I was suggesting either a variance stabilizing
> transformation or using gnls() with a weights function. In both cases
> you need to watch out for scaling of SE's stemming from the fact that
> the variance is supposed to be known in the Poisson distribution, but
> the fitting routines estimate a sigma from the residuals anyway.
>
> I.e. for instance:
>
> > Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 +
> > x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T)
> 9.400602 : 20.00 0.01
> 0.9064723 : 27.21511020 0.03593643
> 0.06338235 : 28.21654101 0.05995647
> 0.02616589 : 28.50221759 0.06815302
> 0.02608587 : 28.54871243 0.06835046
> 0.02608586 : 28.54960242 0.06834083
> 0.02608586 : 28.5495605 0.0683413
> > summary(Phyto.nls2)
>
> Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5)
>
> Parameters:
> Estimate Std. Error t value Pr(>|t|)
> Ymax 28.54956 1.65113 17.291 0.0368 *
> x50 0.06834 0.01264 5.407 0.1164
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.1615 on 1 degrees of freedom
>
> Correlation of Parameter Estimates:
> Ymax
> x50 -0.6833
>
> but here you need to know that the theoretical sd is 0.5, so the Std.
> errors all need multiplication by 0.5/0.1615.
>
> Using gnls() we'd get (if I got the calling sequence right...)
>
> > Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50),
> +
> data=Phytopath,weights=varPower(fixed=.5),start=list
(Ymax=20.0,x50=0.01))> summary(Phyto.gnls)
> Generalized nonlinear least squares fit
> Model: y ~ Ymax/(1 + x/x50)
> Data: Phytopath
> AIC BIC logLik
> 13.71292 11.00875 -3.856458
>
> Variance function:
> Structure: Power of variance covariate
> Formula: ~fitted(.)
> Parameter estimates:
> power
> 0.5
>
> Coefficients:
> Value Std.Error t-value p-value
> Ymax 29.393796 2.4828723 11.838626 0.0536
> x50 0.063028 0.0125244 5.032376 0.1249
>
> Correlation:
> Ymax
> x50 -0.84
>
> Standardized residuals:
> [1] -0.4894266 0.7621749 -0.4237346
> attr(,"std")
> [1] 2.8478142 1.4239071 0.8586483
> attr(,"label")
> [1] "Standardized residuals"
>
> Residual standard error: 0.6367906
> Degrees of freedom: 3 total; 1 residual
>
> and again, it is necessary to adjust the SE's by multiplying with
> 1/0.6368
>
> It is in fact rather easy to do direct MLE for this kind of model:
>
> > with(Phytopath,
> > optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))
> $par
> [1] 28.55963487 0.06833524
>
> $value
> [1] 7.21251
>
> $counts
> function gradient
> 42 13
>
> $convergence
> [1] 0
>
> $message
> NULL
>
> $hessian
> [,1] [,2]
> [1,] 0.07356072 6.631868
> [2,] 6.63186764 1294.792539
>
> Warning message:
> NaNs produced in: dpois(x, lambda, log)
>
> and we can get SE's from the inverse hessian with
>
> > sqrt(diag(solve(with(Phytopath,
> + optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))$hessian)))
> [1] 5.02565893 0.03788052
>
> Notice that the variance stabilizing transform seems to do rather
> better than gnls() compared to true MLE. I'm slightly puzzled by this,
> but presumably it has to do with the fact that MLE for a Gaussian
> model with a mean/variance relationship is *not* identical to the
> iteratively reweighted least squares that glm() et al. are doing. (I
> wouldn't quite rule out that I've simply made a mistake somewhere,
> though...)
>
> --
> O__ ---- Peter Dalgaard Blegdamsvej 3
> c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
> 35327918~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
> (+45) 35327907
>
More information about the R-help
mailing list