[R] glm questions
Paul Johnson
pauljohn at ku.edu
Tue Mar 16 07:00:34 CET 2004
Greetings, everybody. Can I ask some glm questions?
1. How do you find out -2*lnL(saturated model)?
In the output from glm, I find:
Null deviance: which I think is -2[lnL(null) - lnL(saturated)]
Residual deviance: -2[lnL(fitted) - lnL(saturated)]
The Null model is the one that includes the constant only (plus offset
if specified). Right?
I can use the Null and Residual deviance to calculate the "usual model
Chi-squared" statistic
-2[lnL(null) - lnL(fitted)].
But, just for curiosity's sake, what't the saturated model's -2lnL ?
2. Why no 'scaled deviance' in output? Or, how are you supposed to tell
if there is over-dispersion?
I just checked andSAS gives us the scaled and nonscaled deviance.
I have read the Venables & Ripley (MASS 4ed) chapter on GLM . I believe
I understand the cautionary point about overdispersion toward the end
(p. 408). Since I'm comparing lots of other books at the moment, I
believe I see people using the practice that is being criticized. The
Pearson Chi-square based estimate of dispersion is recommended and one
uses an F test to decide if the fitted model is significantly worse than
the saturated model. But don't we still assess over-dispersion by
looking at the scaled deviance (after it is calculated properly)?
Can I make a guess why glm does not report scaled deviance? Are the glm
authors trying to discourage us from making the lazy assessment in which
one concludes over-dispersion is present if the scaled deviance exceeds
the degrees of freedom?
3. When I run "example(glm)" at the end there's a Gamma model and the
printout says:
(Dispersion parameter for Gamma family taken to be 0.001813340)
I don't find an estimate for the Gamma distribution's shape paremeter in
the output. I'm uncertain what the reported dispersion parameter refers
to. Its the denominator (phi) in the exponential family formula, isn't
it?
y*theta - c(theta)
exp [ --------------------- - h(y,phi) ]
phi
4. For GLM teaching purposes, can anybody point me at some good examples
of GLM that do not use Normal, Poisson, Negative Binomial, and/or
Logistic Regression? I want to justify the effort to understand the GLM
as a framework. We have already in previous semesters followed the
usual "econometric" approach in which OLS, Poisson/Count, and Logistic
regression are treated as special cases. Some of the students don't see
any benefit from tackling the GLM's new notation/terminology.
What I'm lacking is some persuasive evidence that the effort to master
the details of the GLM is worthwhile. I could really use some data and
reference articles that have applications of Gamma distributed (or
exponential) variables, say, or Weibull, or whatever.
I've been dropping my course notes in this directory:
http://lark.cc.ku.edu/~pauljohn/ps909/AdvancedRegression.
The documents GLM1 and GLM2 are pretty good theoretical surveys <patting
self on back/>. But I need to work harder to justify the effort by
providing examples.
I'd appreciate any feedback, if you have any. And, of course, if you
want to take these documents and use them for your own purposes, be my
guest.
4. Is it possible to find all methods that an object inherits?
I found out by reading the source code for J Fox's car package that model.matrix() returns the X matrix of coded input variables, so one can do fun things like calculate robust standard errors and such. That's really useful, because before I found that, I was recoding up a storm to re-create the X matrix used in a model.
Is there a direct way to find a list of all the other methods that would apply to an object?
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700s
More information about the R-help
mailing list