[Rd] How do I know if the deviance of a glm fit was fixed?

Sat, 19 Jan 2002 07:31:37 +0000 (GMT)

On Sat, 19 Jan 2002, Gordon Smyth wrote:

> I think that it could improve the glm family function to include an
> argument or attribute that tells explicitly that the dispersion is fixed,
> equivalent to the $scale declaration in GLIM. This would be set T by
> default for the binomial and Poisson families and F by default for most
> other families. This could be done I think in a way which would add extra
> features but would not contradict the S-Plus "Blue Book". In effect, I
> would like the dispersion to be considered part of the "fit" of a glm
> because it is needed for standard errors and hypothesis testing.

You would need to say what the fixed value is ....  So the obvious thing
would be to add a dispersion argument as used in summary.glm, predict.glm
and anova.glm, with the same semantics.

The argument against is S-compatibility. Many R users also use S-PLUS, and
R owes S-PLUS users for the majority of its software.  Some porting from R
to S-PLUS is now happening and I'd like to make that easy and to encourage
it.  (Hence some of the advice on generics/methods is stricter than it
needs to be for current R, but it is essential for current S.)

There is also a transitional issue, as stored glm objects would not have
the dispersion parameter set.  That makes this (quite a lot) harder to
implement, and I am rather against altering the definition of a class.

> The advantage of an explicit fixed-dispersion-attribute would come when
> (i) the dispersion happens to be known even though the dispersion is not
> always known for the response family being used. Eg the family is Gamma but
> you know that the responses are multiples of chi-square random variables on
> 1 df.
> (ii) someone defines a new glm family, other than the binomial or Poisson,
> with fixed dispersion
> (iii) one want to use a binomial or Poisson family with variable
> dispersion, without switching to the quasi family.

Take a closer look over (iii): there are quasibinomial and quasipoisson
families in R.

> In case (i), you can work around by calling summary with dispersion=2 (for
> chisquare_1 responses), but not all functions which take glm.objects as
> arguments have a dispersion argument. And in the spirit of object
> orientated programming, shouldn't the glm.object contain all the
> information necessary to construct a standard error or anova from it?

I believe all functions which need to know the dispersion do have a
dispersion argument (although that was not true a while back).  Can you
please list the exceptions?

I am not arguing against a change, but I am arguing for fully informed
discussion.

[...]

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._