[Rd] Standardized Pearson residuals

Thu Mar 17 16:14:09 CET 2011

>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>     on Thu, 17 Mar 2011 15:45:01 +0100 writes:

    > On Mar 16, 2011, at 23:34 , John Maindonald wrote:

    >> One can easily test for the binary case and not give the
    >> statistic in that case.

    > Warning if expected cell counts < 5 would be another
    > possibility.

    >> 
    >> A general point is that if one gave no output that was not open
    >> to abuse, there'd be nothing given at all!  One would not be
    >> giving any output at all from poisson or binomial models, given
    >> that data that really calls for quasi links (or a glmm with
    >> observation level random effects) is in my experience the rule
    >> rather than the exception!

    > Hmmm. Not sure I agree on that entirely, but that's a different
    > discussion.

    >> At the very least, why not a function dispersion() or
    >> pearsonchisquare() that gives this information.

    > Lots of options here.... Offhand, my preference would go to
    > something like anova(..., test="score") and/or an extra line in
    > summary(). It's not a computationally intensive item as far as I
    > can see, it's more about "output real estate" -- how "SAS-like"
    > do we want to become?

    >> Apologies that I misattributed this.

    > Never mind...

    > Back to the original question:

    > The current rstandard() code reads

## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
    res <- infl$wt.res # = "dev.res"  really
    res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
    res[is.infinite(res)] <- NaN
    res
}

    > which is "svn blame" to ripley but that is due to the 2003
    > code reorganization (except for the infinity check from
    > 2005). So apparently, we have had that FIXME since
    > forever... and finding its author appears to be awkward
    > (Maechler, perhaps?).

yes, almost surely

    > I did try Bretts code in lieu of the above (with a mod to
    > handle $dispersion) and even switched the default to use
    > the Pearson residuals. Make check-devel sailed straight
    > through apart from the obvious code/doc mismatch, so we
    > don't have any checks in place nor any examples using
    > rstandard(). I rather strongly suspect that there aren't
    > many user codes using it either.

    > It is quite tempting simply to commit the change (after
    > updating the docs). One thing holding me back though: I
    > don't know what "the literature" refers to.

well, "the relevant publications on the topic" ...
and now define that (e.g. using the three 'References' on the
help page).
Really, that's what I think I meant when I (think I) wrote that FIXME.
The point then I think was that we had code "donations", and they
partly were clearly providing functionality that was (tested)
"correct" (according to e.g. McCoullagh & Nelder and probably
another one or two text books I would have consulted ... no
large Wikipedia back then), 
but also provided things for which there was nothing in "the
literature", but as the author provided them with other good
code, we would have put it in, as well....
== my vague recollection from the past

Martin

    > -- 
    > Peter Dalgaard Center for Statistics, Copenhagen Business
    > School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv:
    > PDalgd at gmail.com

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel