[R] glm gives t test sometimes, z test others. Why?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Mar 5 08:52:25 CET 2006
First off, glm() does not report these at all. The summary() method
reports 't' and 'z' ratios, not tests (although they can be interpreted a
test statistics). That is important, for
1) Use of summary() is optional. You could use drop1() or car's Anova()
instead of summary to do a test, and use profile() rather than summary()
to construct confidence intervals. (And these days I normally do,
although a decade ago they could be too slow.)
2) summary.glm() has a 'dispersion' parameter. If the dispersion is
estimated this is labelled as a 't' ratio, otherwise as a 'z' ratio. The
quoted p-value is from a reference Student t in the first case and a
Normal in the second. So for a single glm() fit you may see either 't' or
'z' depending on how you use summary.glm().
BTW, summary.glm in S always labels these as 't values' (which they are
not always) but does not report p values, something that seems to me to be
wise. But I lost that argument for R in the 1990s.
On Sun, 5 Mar 2006, Paul Johnson wrote:
> I just ran example(glm) and happened to notice that models based on
> the Gamma distribution gives a t test, while the Poisson models give a
> z test. Why?
>
> Both are b/s.e., aren't they?
In your terminology below, bhat/e.s.e.(bhat), the first 'e' being for
'estimated' (which may or may not be part of your definition of 'standard
error').
> I can't find documentation supporting the claim that the distribution
> is more like t in one case than another, except in the Gaussian case
> (where it really is t).
Hmm. Even in the Gaussian case it depends on whether the residual
variance is estimated or assume known. summary.glm allows the estimation
of Gaussian model with known signa^2 whereas summary.lm does not.
There is some support that where the dispersion is estimated, the
reference t is more accurate than a Normal would be. I am not in my
office, but believe you will find the arguments in McCullagh & Nelder
(1989). Note though that for families other than the Gaussian the
dispersion estimate is not the MLE and other estimates may be preferable.
> Aren't all of the others approximations based on the Wald idea that
>
> bhat^2
> ------------
> Var(bhat)
>
> is asymptotically Chi-square?
Not really, more that bhat - b is asymptotically or exactly normal with
computable variance which in general depends on the unknown true
parameters. So you have to replace your denominator by an estimate of it,
and in general you increase the variability if you do not know the
dispersion.
> And that sqrt(Chi-square) is Normal.
Hmm, Normal^2 is chisq_1, but the 1 is crucial.
>
> While I'm asking, I wonder if glm should report them at all. I've
> followed up on Prof Ripley's advice to read the Hauck & Donner article
> and the successors, and I'm persuaded that we ought to just use the
> likelihood ratio test to decide about individual parameters.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list