[R] glm gives t test sometimes, z test others. Why?

Sun Mar 5 08:52:25 CET 2006

First off, glm() does not report these at all.  The summary() method 
reports 't' and 'z' ratios, not tests (although they can be interpreted a 
test statistics).  That is important, for

1) Use of summary() is optional.  You could use drop1() or car's Anova() 
instead of summary to do a test, and use profile() rather than summary() 
to construct confidence intervals.  (And these days I normally do, 
although a decade ago they could be too slow.)

2) summary.glm() has a 'dispersion' parameter.  If the dispersion is 
estimated this is labelled as a 't' ratio, otherwise as a 'z' ratio. The 
quoted p-value is from a reference Student t in the first case and a 
Normal in the second.  So for a single glm() fit you may see either 't' or 
'z' depending on how you use summary.glm().

BTW, summary.glm in S always labels these as 't values' (which they are 
not always) but does not report p values, something that seems to me to be 
wise.  But I lost that argument for R in the 1990s.

On Sun, 5 Mar 2006, Paul Johnson wrote:

> I just ran example(glm) and happened to notice that models based on
> the Gamma distribution gives a t test, while the Poisson models give a
> z test. Why?
>
> Both are b/s.e., aren't they?

In your terminology below, bhat/e.s.e.(bhat), the first 'e' being for 
'estimated' (which may or may not be part of your definition of 'standard 
error').

> I can't find documentation supporting the claim that the distribution
> is more like t in one case than another, except in the Gaussian case
> (where it really is t).

Hmm.  Even in the Gaussian case it depends on whether the residual 
variance is estimated or assume known.  summary.glm allows the estimation 
of Gaussian model with known signa^2 whereas summary.lm does not.

There is some support that where the dispersion is estimated, the
reference t is more accurate than a Normal would be.  I am not in my 
office, but believe you will find the arguments in McCullagh & Nelder 
(1989).  Note though that for families other than the Gaussian the 
dispersion estimate is not the MLE and other estimates may be preferable.

> Aren't all of the others approximations based on the Wald idea that
>
>    bhat^2
>   ------------
>    Var(bhat)
>
> is asymptotically Chi-square?

Not really, more that bhat - b is asymptotically or exactly normal with 
computable variance which in general depends on the unknown true 
parameters.  So you have to replace your denominator by an estimate of it, 
and in general you increase the variability if you do not know the 
dispersion.

> And that sqrt(Chi-square) is Normal.

Hmm, Normal^2 is chisq_1, but the 1 is crucial.

>
> While I'm asking, I wonder if glm should report them at all. I've
> followed up on Prof Ripley's advice to read the Hauck & Donner article
> and the successors, and I'm persuaded that we ought to just use the
> likelihood ratio test to decide about individual parameters.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595