[R] "prediction intervals for glm"
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Thu May 1 14:29:32 CEST 2003
On Thu, 1 May 2003, Fredrik Lundgren wrote:
> I wouldn't know anything about the theoretical problems with glm and a
> binary outcome but there is a "prediction interval" in predict.glm of
> S-Plus(6.02 version something). I have failed to source it to R (and I do
> have difficulties with the higher forms of matrix manipulations). In the
> medical field where I'm active I think it has a high value to generate
> "prediction intervals" for risk and benefit calculations for individual
> patients. If it's theoretically fishy or unsound with a prediction
> interval maybe some bootstrap appraoch could do the trick?
It's more than fishy ... it uses the normal approximation on link scale (as
I recall) which is very unlikely to be valid except for the gaussian
family. Indeed for 0/1 data the interval will have coverage 0, exactly.
I don't see how a bootstrap would help either: the issue is to combine the
(reasonably well-known) uncertainty in the prediction of the mean with the
variability in the observation. That would be easy to do by simulation,
but not by re-sampling. (Or did you think all simulation-based inference
was `some bootstrap approach'.) However, you are not going to be able to
summarize that predictive distribution as an *interval* for 0/1 data.
> Sincerely Fredrik Lundgren
> ----- Original Message -----
> From: "Peter Dalgaard BSA" <p.dalgaard at biostat.ku.dk>
> To: "Spencer Graves" <spencer.graves at pdf.com>
> Cc: "Fredrik Lundgren" <fredrik.lundgren at norrkoping.mail.telia.com>; <R-help at stat.math.ethz.ch>
> Sent: Tuesday, April 29, 2003 4:48 PM
> Subject: Re: [R] "prediction intervals for glm"
> > Spencer Graves <spencer.graves at pdf.com> writes:
> > > "?predict.glm" produced something in my copy of R 1.6.2 under Windows
> > > 2000.
> > .. but probably not what Fredrik wanted. Prediction intervals (i.e.
> > intervals with 95% probability of catching a new observation) are
> > somewhat tricky even to define for glms. For Normal responses you have
> > the formula yhat +- qt(.975,df)* sqrt(s^2+se(yhat)^2), for other
> > continuous responses that would become (approximately!) the error
> > distribution convolved with a Gaussian density, for discrete responses
> > - say 0/1 - I wouldn't know what to do.
> > >
> > > Fredrik Lundgren wrote:
> > > > Where can i find prediction intervals for glm in R?
> > --
> > O__ ---- Peter Dalgaard Blegdamsvej 3
> > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> R-help at stat.math.ethz.ch mailing list
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help