[Rd] How do I know if the deviance of a glm fit was fixed?

Gordon Smyth smyth@wehi.edu.au
Sat, 19 Jan 2002 20:41:41 +1100


Thanks for your reply. My experience is mainly with S-Plus, so I must admit 
that I am coming from an uninformed viewpoint as far as glm functions in R 
is concerned. My apologies for this.

At 07:31 AM 19/01/2002 +0000, you wrote:
>On Sat, 19 Jan 2002, Gordon Smyth wrote:
>
> > I think that it could improve the glm family function to include an
> > argument or attribute that tells explicitly that the dispersion is fixed,
> > equivalent to the $scale declaration in GLIM. This would be set T by
> > default for the binomial and Poisson families and F by default for most
> > other families. This could be done I think in a way which would add extra
> > features but would not contradict the S-Plus "Blue Book". In effect, I
> > would like the dispersion to be considered part of the "fit" of a glm
> > because it is needed for standard errors and hypothesis testing.
>
>You would need to say what the fixed value is ....  So the obvious thing
>would be to add a dispersion argument as used in summary.glm, predict.glm
>and anova.glm, with the same semantics.
>
>The argument against is S-compatibility. Many R users also use S-PLUS, and
>R owes S-PLUS users for the majority of its software.  Some porting from R
>to S-PLUS is now happening and I'd like to make that easy and to encourage
>it.  (Hence some of the advice on generics/methods is stricter than it
>needs to be for current R, but it is essential for current S.)
>
>There is also a transitional issue, as stored glm objects would not have
>the dispersion parameter set.  That makes this (quite a lot) harder to
>implement, and I am rather against altering the definition of a class.

All good points.

I had in mind that glm would have an optional argument, prior.dispersion 
say, and that that argument would be passed to the glm.object as an 
optional component, just as prior.weights is an optional component in the 
glm.object in S-Plus. That would allow glm.objects without the 
prior.dispersion component to be treated just as at present. I notice now 
though that prior.weights is *always* present in R glm.objects even when 
weights was not present in the glm call, unlike S, so R may be working with 
a stricter class definition that is S.

The are already a number of components (aic, converged, boundary and 
xlevels) which are present in R glm.objects which are not present in S 
objects (at least not in S-Plus 2000 which is the most recent version that 
I have access to). So a prior.dispersion component would at least not be a 
precedent in that regard.

> > The advantage of an explicit fixed-dispersion-attribute would come when
> > (i) the dispersion happens to be known even though the dispersion is not
> > always known for the response family being used. Eg the family is Gamma but
> > you know that the responses are multiples of chi-square random variables on
> > 1 df.
> > (ii) someone defines a new glm family, other than the binomial or Poisson,
> > with fixed dispersion
> > (iii) one want to use a binomial or Poisson family with variable
> > dispersion, without switching to the quasi family.
>
>Take a closer look over (iii): there are quasibinomial and quasipoisson
>families in R.

Thanks for the pointers to quasibinomial and quaipoisson. They're not in 
S-Plus and I wasn't aware of them.

> > In case (i), you can work around by calling summary with dispersion=2 (for
> > chisquare_1 responses), but not all functions which take glm.objects as
> > arguments have a dispersion argument. And in the spirit of object
> > orientated programming, shouldn't the glm.object contain all the
> > information necessary to construct a standard error or anova from it?
>
>I believe all functions which need to know the dispersion do have a
>dispersion argument (although that was not true a while back).  Can you
>please list the exceptions?

R seems to be have been very careful in this regard, and your own 
contributions to the R project have no doubt gone a good way to making this 
true.

I had in mind the pointwise function. predict.glm accepts a dispersion 
argument, but when pointwise takes the output object from predict.glm, it 
"forgets" that the dispersion was known rather than estimated and uses a 
t-distribution instead of normal for the confidence intervals. (I notice 
now though that pointwise is not in the R base package, so this might not 
be a fair example.)

print.glm outputs AIC for the fitted model but doesn't accept a dispersion 
argument. The glm documentation says that AIC is computed "assuming that 
the dispersion is known", but a dispersion estimate of some sort must be 
needed for the AIC value. Looking at Gamma()$aic it appears that the 
dispersion is always estimated from the mean residual deviance whether the 
dispersion is actually known or not.


My main point is that it would be nice for glm.objects to "know" how they 
should be treated, so that functions down the track don't need to keep 
including dispersion= options. I won't push any further on the topic though 
at least until I have more experience with the R glm functions.

Best wishes
Gordon




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._