[Rd] proposal: allowing alternative variance estimators in glm/lm

Tue Jan 2 19:03:02 CET 2007

On Tue, 2 Jan 2007, ivo welch wrote:
>
> As a non-statistician user of R, maybe a hook functionality at strategic
> places could provide some flexibility without too much pain.   I think
> replacing the standard output from summary.lm would be a bad idea (it
> could easily create errors downstream, when idiots like myself ask "why
> don't I get the s.e. that stata produces?  duh---you loaded
> heteroskedasticity adjustment, but forgot about it).  But I think some
> flexibility to add more information would be a very good thing.
>
> Hooks that can be set by functions (perhaps cascades) would allow third
> parties to create additional statistics, that could survive future
> changes to the functions themselves, without requiring a full object
> paradigm.   For example, summary.lm could provide two hooks that allow
> programmers to chain my own objects to either the ans$coefficients and
> the ans object.  (I guess even one hook would do.)  Well-thought-out
> hooks could also add to print methods, etc., without requiring complete
> function rewrites, and would survive future changes in the real R code
> itself.

In some sense this is what I was proposing -- I think that summary.lm and 
predict.lm should get coefficients from coef() and variance matrices from 
vcov() to provide hooks.

This should ideally happen automatically by inheritance, but for 
historical reasons all the basic code was written for the most specialized 
case (classical linear model) rather than the most general case.

>> From the perspective of a first-time amateurish end-user, an invokation
> of "library(lm.addnormalized)" could then magically always add a
> normalized coefficient to the coefficient output.  An invokation of
> "library(lm.addheteroskedasticity)" could magically always add
> heteroskedasticity se's and T-stats.  And so on.

Ick.

I really think you want these properties to be part of the object, not 
global options (and particularly not global options set by the library() 
function).  It should be possible to simultaneously have objects that 
use the fisher-information standard errors and other objects that use 
sandwich estimators.

I would say that "using sandwich standard errors" is a property that 
should be specified when the object is created (as it is for 
survival::coxph()  and gee::gee()).

"Displaying normalized coefficients" is a bit different.  I don't think 
you want to override what the coef() method produces, since that will mess 
up fitted values and predictions. You would also need to override the 
vcov() method, and that would cause other problems.

So in order to do what you want, the objects would need a printedcoef() 
and printedSE() method in addition to the actual coef() and vcov(). These 
would have to be called by a summary-like function.  But even this isn't 
the level of generality that you would want, since if you are doing this 
work it would be nice to allow transformed coefficients (eg odds ratios 
rather than log odds ratios for logistic regression), and you probably 
then need a printedCI() method for confidence intervals.  This would give 
quite a useful modelsummary() function (provided it did something 
sensible when the additional methods weren't present).   The Design 
package does things along these lines already, but its display functions 
don't work on vanilla model objects.

In any case, I don't think this function should be called summary.lm(). My 
proposal was just to allow predict.lm and summary.lm to be used in cases 
where they already give the right sort of results and in addition to tidy 
up the duplication of variance-matrix code.

 	-thomas