{g}lm coefficients labeling etc. [was "R0.62.3 problems"]

Thu, 3 Sep 1998 17:19:03 +0200

	[I think it would make more sense to use  R-devel instead of R-help
	 for discussions like these, that's why ...]

>>>>> "Jim" == Jim Lindsey <jlindsey@alpha.luc.ac.be> writes:

    Jim> First, I did not install R0.62.2 and there was no problem with
    Jim> binary factor variables in 0.62.1. I discovered the problem since
    Jim> .3 appeared.
Yes. 
and I agree 100% with Brian's remark that we probably should have started
discussing these things on R-devel rather than private E-mails + R-core...
Excuse us!

    Jim> My Liege undergraduate social science students have a somewhat
    Jim> different conception of contrasts than Brian's Oxford math
    Jim> students.  As it stands, the output of glm for binary factor
    Jim> variables is unusable for them.

Thank you Jim, for your complete example session -- including data !
As a remark to everyone:  I think you will get much better feedback if you
use such complete examples...

    Jim> .........
    Jim> ......... examples and thoughts...
    Jim> .........

    Jim> The need to put the label on x in the interaction in the second
    Jim> glm output clearly demonstrates that Brian is wrong.
    Jim> So put it on all the time.

    Jim> The idea is maximum transparency, not maximum
    Jim> opaqueness.  Otherwise, let's also stop showing the levels when a
    Jim> factor variable is printed out: S-Plus (apparently) does not do
    Jim> it, so it must be useless to show them. ;-)

good point.
However, this is only a print.xx method whereas with the coefficients'
names/labels, we are talking about the structure returned by [g]lm(.).

There *are* good reasons to be S-compatible
	   --- with some carefully discussed exceptions.

I for myself had rather kept more of the non-compatibilities of R;
{{ factor(..) ! }} but I wasn't among those doing the port of packages 
from S-plus to R...

By the way: For "poly"-contrasted factors, we are already slightly
S-incompatible, since S isn't compatible with itself!

S> dimnames(contr.poly(11))
[[1]]:
character(0)

[[2]]:
 [1] ".L"    ".Q"    ".C"    " ^ 4"  " ^ 5"  " ^ 6"  " ^ 7"  " ^ 8"  " ^ 9" 
[10] " ^ 10"

S> dimnames(contr.poly(12))
[[1]]:
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"

[[2]]:
 [1] "^1"  "^2"  "^3"  "^4"  "^5"  "^6"  "^7"  "^8"  "^9"  "^10" "^11"

---------------------------------------- and we have
R> dimnames(contr.poly(12))
[[1]]
NULL

[[2]]
 [1] ".L"  ".Q"  ".C"  "^4"  "^5"  "^6"  "^7"  "^8"  "^9"  "^10" "^11"

and the consistent behavior for n=11 (or any other n).

((( thus showing yet another (unnecessary?) incompatibility:

    we have "NULL"   where  S has "character(0)"
)))

    Jim> By the way, my suggestion sometime ago that contrasts should be
    Jim> printed out is only half implemented: they appear with print.glm
    Jim> but not with summary.glm.

hmm, they *were* in 0.62.2 (put in by me, upon your suggestion!)
the version which you missed unfortunately.

The reason is -- as Brian mentioned -- a change of the contrasts attribute
of lm(.) and glm(.) objects.
and I hadn't found the time to re-add some "global contrasts: ...."
line to print.summary.[g]lm(..).  
May still be a good idea; however, a related one is the following:

What I have been thinking about since,
is an idea to add 
 1) a 1-letter code column, with  
	 "D" or "c"  for "Discrete" (factor) or "continuuous"
    ("D" or "c": again thinking about ascii Graphics: the 2 letters should
    be  *visually* easily discernible.. they *are* for human eyes...
 2) Another Column which
	-- only for the "D" ones --
    has another  n-letter code indicating which contrasts were active;
    something like
	"Trt" "Sum" "Hlrt" "Poly" "###"
    (where "###" stands for ``other''/``self-defined''
     and contains the infinite other possibilities as indicated by BDR).

to each row of the coef matrix as produced by print.summary.[g]lm.

----
This still does not solve all the problems you mentioned in your example;
at the moment (where I haven't investigated more) I tend to agree with
your statement above ``So put it on all the time.''

    Jim> As to the significance stars, I obviously immediately shut them
    Jim> off on my site Rprofile, but that is on my portable. And they will
    Jim> be on our teaching server as well. My concern is rather with the
    Jim> image of R: is it to be modern, or 1950s statistics? Must I tell
    Jim> my students not to download R from the Web because it will do
    Jim> weird things (inconsistent labels on factor variables, strange
    Jim> stars appearing, ...) but only to use the copy I give them? So
    Jim> let's set the default option to FALSE.

I have the impression that you never got my point here..
It *is* the nineties, and the 'stars'  
are a ``GRAPHICAL'' encoding of P-values, not more, not less.

We have been using them in *real* data analysis (as opposed to teaching
exercises) for five or more years now.
If you fit many models with dozens of coefficients each,
the *graphical* clues *are* valuable, believe it or not.

[[I've learned before the nineties that ascii output
  can also be considered as some sort of graphics and that
  it matters how (and with which precision,...)
  you print numbers,  tables and such...

  Maybe the really old-fashioned thing is that we still use ASCII output
  quite a bit;  however, a lot of us realize that it has the big advantage
  that is *portable* and *archivable* ...
  are you sure that in 15 years you will be able to easily display/print
  the GIFs you create today?
]]

I see that there's a good reason not to have (doubtful) P values at all;
That's why vanilla S (as opposed to S-plus) never had P values in
print.summary.lm(.)
However, I think (pro statisticians and social scientists) will want them
often enough.

Back to the ``graphical P values''  
((I should never have accepted the name "significance stars"!)):

I've been thinking of new default,
neither TRUE nor FALSE, which would not draw them when either
	- there's only 1 or 2 coefficients
	- all P-values would be drawn as "***"
	- all P-values would be drawn as " "
		(which is already an empty drawing now, but
		 would also drop the corresponding ``legend'')

And -- in any case--
I don't think it would be so hard to tell your students to set a few
options(..) in their profile or before they start an analysis.

We will have more options in the future, and some will be good for
teaching, others will be preferable for data analysis,
others for ``number crunching'', etc.

    Jim> If anyone is interested in implementing profile likelihood
    Jim> intervals, there are far simpler and faster numerical methods than
    Jim> Brian's splines. Personally, I prefer plotting the whole profile
    Jim> curve. With glm, this is quick and easy for the linear parameters
    Jim> (but not the scale) using an offset. Most but not all of my
    Jim> library functions allow it to be done simply as well. But such
    Jim> curves do not provide the needed replacement printout instead of
    Jim> standard errors for non-normal models ...

To allow a different more sensible output with print.summary...(.)
in these cases could be another options() setting I think.
The default could then depend on ``n'' and ``p'' where for small models you
would want to afford the much extra computation and for large n*p you'd
rather have a quick output with  summary(.).

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._