{g}lm coefficients labeling etc. [was "R0.62.3 problems"]
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Thu, 3 Sep 1998 17:19:03 +0200
[I think it would make more sense to use R-devel instead of R-help
for discussions like these, that's why ...]
>>>>> "Jim" == Jim Lindsey <jlindsey@alpha.luc.ac.be> writes:
Jim> First, I did not install R0.62.2 and there was no problem with
Jim> binary factor variables in 0.62.1. I discovered the problem since
Jim> .3 appeared.
Yes.
and I agree 100% with Brian's remark that we probably should have started
discussing these things on R-devel rather than private E-mails + R-core...
Excuse us!
Jim> My Liege undergraduate social science students have a somewhat
Jim> different conception of contrasts than Brian's Oxford math
Jim> students. As it stands, the output of glm for binary factor
Jim> variables is unusable for them.
Thank you Jim, for your complete example session -- including data !
As a remark to everyone: I think you will get much better feedback if you
use such complete examples...
Jim> .........
Jim> ......... examples and thoughts...
Jim> .........
Jim> The need to put the label on x in the interaction in the second
Jim> glm output clearly demonstrates that Brian is wrong.
Jim> So put it on all the time.
Jim> The idea is maximum transparency, not maximum
Jim> opaqueness. Otherwise, let's also stop showing the levels when a
Jim> factor variable is printed out: S-Plus (apparently) does not do
Jim> it, so it must be useless to show them. ;-)
good point.
However, this is only a print.xx method whereas with the coefficients'
names/labels, we are talking about the structure returned by [g]lm(.).
There *are* good reasons to be S-compatible
--- with some carefully discussed exceptions.
I for myself had rather kept more of the non-compatibilities of R;
{{ factor(..) ! }} but I wasn't among those doing the port of packages
from S-plus to R...
By the way: For "poly"-contrasted factors, we are already slightly
S-incompatible, since S isn't compatible with itself!
S> dimnames(contr.poly(11))
[[1]]:
character(0)
[[2]]:
[1] ".L" ".Q" ".C" " ^ 4" " ^ 5" " ^ 6" " ^ 7" " ^ 8" " ^ 9"
[10] " ^ 10"
S> dimnames(contr.poly(12))
[[1]]:
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
[[2]]:
[1] "^1" "^2" "^3" "^4" "^5" "^6" "^7" "^8" "^9" "^10" "^11"
---------------------------------------- and we have
R> dimnames(contr.poly(12))
[[1]]
NULL
[[2]]
[1] ".L" ".Q" ".C" "^4" "^5" "^6" "^7" "^8" "^9" "^10" "^11"
and the consistent behavior for n=11 (or any other n).
((( thus showing yet another (unnecessary?) incompatibility:
we have "NULL" where S has "character(0)"
)))
Jim> By the way, my suggestion sometime ago that contrasts should be
Jim> printed out is only half implemented: they appear with print.glm
Jim> but not with summary.glm.
hmm, they *were* in 0.62.2 (put in by me, upon your suggestion!)
the version which you missed unfortunately.
The reason is -- as Brian mentioned -- a change of the contrasts attribute
of lm(.) and glm(.) objects.
and I hadn't found the time to re-add some "global contrasts: ...."
line to print.summary.[g]lm(..).
May still be a good idea; however, a related one is the following:
What I have been thinking about since,
is an idea to add
1) a 1-letter code column, with
"D" or "c" for "Discrete" (factor) or "continuuous"
("D" or "c": again thinking about ascii Graphics: the 2 letters should
be *visually* easily discernible.. they *are* for human eyes...
2) Another Column which
-- only for the "D" ones --
has another n-letter code indicating which contrasts were active;
something like
"Trt" "Sum" "Hlrt" "Poly" "###"
(where "###" stands for ``other''/``self-defined''
and contains the infinite other possibilities as indicated by BDR).
to each row of the coef matrix as produced by print.summary.[g]lm.
----
This still does not solve all the problems you mentioned in your example;
at the moment (where I haven't investigated more) I tend to agree with
your statement above ``So put it on all the time.''
Jim> As to the significance stars, I obviously immediately shut them
Jim> off on my site Rprofile, but that is on my portable. And they will
Jim> be on our teaching server as well. My concern is rather with the
Jim> image of R: is it to be modern, or 1950s statistics? Must I tell
Jim> my students not to download R from the Web because it will do
Jim> weird things (inconsistent labels on factor variables, strange
Jim> stars appearing, ...) but only to use the copy I give them? So
Jim> let's set the default option to FALSE.
I have the impression that you never got my point here..
It *is* the nineties, and the 'stars'
are a ``GRAPHICAL'' encoding of P-values, not more, not less.
We have been using them in *real* data analysis (as opposed to teaching
exercises) for five or more years now.
If you fit many models with dozens of coefficients each,
the *graphical* clues *are* valuable, believe it or not.
[[I've learned before the nineties that ascii output
can also be considered as some sort of graphics and that
it matters how (and with which precision,...)
you print numbers, tables and such...
Maybe the really old-fashioned thing is that we still use ASCII output
quite a bit; however, a lot of us realize that it has the big advantage
that is *portable* and *archivable* ...
are you sure that in 15 years you will be able to easily display/print
the GIFs you create today?
]]
I see that there's a good reason not to have (doubtful) P values at all;
That's why vanilla S (as opposed to S-plus) never had P values in
print.summary.lm(.)
However, I think (pro statisticians and social scientists) will want them
often enough.
Back to the ``graphical P values''
((I should never have accepted the name "significance stars"!)):
I've been thinking of new default,
neither TRUE nor FALSE, which would not draw them when either
- there's only 1 or 2 coefficients
- all P-values would be drawn as "***"
- all P-values would be drawn as " "
(which is already an empty drawing now, but
would also drop the corresponding ``legend'')
And -- in any case--
I don't think it would be so hard to tell your students to set a few
options(..) in their profile or before they start an analysis.
We will have more options in the future, and some will be good for
teaching, others will be preferable for data analysis,
others for ``number crunching'', etc.
Jim> If anyone is interested in implementing profile likelihood
Jim> intervals, there are far simpler and faster numerical methods than
Jim> Brian's splines. Personally, I prefer plotting the whole profile
Jim> curve. With glm, this is quick and easy for the linear parameters
Jim> (but not the scale) using an offset. Most but not all of my
Jim> library functions allow it to be done simply as well. But such
Jim> curves do not provide the needed replacement printout instead of
Jim> standard errors for non-normal models ...
To allow a different more sensible output with print.summary...(.)
in these cases could be another options() setting I think.
The default could then depend on ``n'' and ``p'' where for small models you
would want to afford the much extra computation and for large n*p you'd
rather have a quick output with summary(.).
Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._