[R] A comment about R:

Thu Jan 5 18:42:59 CET 2006

John Fox wrote:
> Dear Peter,
> 
> 
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch 
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter 
>>Muhlberger
>>Sent: Wednesday, January 04, 2006 2:43 PM
>>To: rhelp
>>Subject: [R] A comment about R:
>>
> 
> 
> . . .
>  
> 
>>Ex. 1)  Wald tests of linear hypotheses after max. likelihood 
>>or even after a regression.  "Wald" does not even appear in 
>>my standard R package on a search.  There's no comment in the 
>>lm help or optim help about what function to use for 
>>hypothesis tests.  I know that statisticians prefer 
>>likelihood ratio tests, but Wald tests are still useful and 
>>indeed crucial for first-pass analysis.  After searching with 
>>Google for some time, I found several Wald functions in 
>>various contributed R packages I did not have installed.  One 
>>confusion was which one would be relevant to my needs.  This 
>>took some time to resolve.  I concluded, perhaps on 
>>insufficient evidence, that package car's Wald test would be 
>>most helpful.  To use it, however, one has to put together a 
>>matrix for the hypotheses, which can be arduous for a 
>>many-term regression or a complex hypothesis.  
>>In comparison, 
>>in Stata one simply states the hypothesis in symbolic terms.  
>>I also don't know for certain that this function in car will 
>>work or work properly w/ various kinds of output, say from lm 
>>or from optim.  To be sure, I'd need to run time-consuming 
>>tests comparing it with Stata output or examine the 
>>function's code.  In Stata the test is easy to find, and 
>>there's no uncertainty about where it can be run or its 
>>accuracy.  Simply having a comment or "see also" in lm help 
>>or mle or optim help pointing the user to the right Wald 
>>function would be of enormous help.

The Design package's anova.Design and contrast.Design make many Wald 
tests very easy.  contrast( ) will allow you to test all kinds of 
hypotheses by stating which differences in predicted values you are 
interested in.

Frank Harrell

>>
> 
> 
> 
> The reference, I believe, is to the linear.hypothesis() function, which has
> methods for lm and glm objects. [To see what kinds of objects
> linear.hypothesis is suitable for, use the command
> methods(linear.hypothesis).] For lm objects, you get an F-test by default.
> Note that the Anova() function, also in car, can more conveniently compute
> Wald tests for certain kinds of hypotheses. More generally, however, I'd be
> interested in your suggestions for an alternative method of specifying
> linear hypotheses. There is currently no method for mle objects, but adding
> one is a good idea, and I'll do that when I have a chance. (In the meantime,
> it's very easy to compute Wald tests from the coefficients and the
> hypothesis and coefficient-covariance matrices. Writing a small function to
> do so, without the bells and whistles of something like linear.hypothesis(),
> should not be hard. Indeed, the ability to do this kind of thing easily is
> what I see as the primary advantage of working in a statistical computing
> environment like R -- or Stata.
> 
> 
>>Ex. 2) Getting neat output of a regression with Huberized 
>>variance matrix.
>>I frequently have to run regressions w/ robust variances.  In 
>>Stata, one simply adds the word "robust" to the end of the 
>>command or "cluster(cluster.variable)" for a cluster-robust 
>>error.  In R, there are two functions, robcov and hccm.  I 
>>had to run tests to figure out what the relationship is 
>>between them and between them and Stata (robcov w/o cluster 
>>gives hccm's hc0; hccm's hc1 is equivalent to Stata's 
>>'robust' w/o cluster; etc.).  A single sentence in hccm's 
>>help saying something to the effect that statisticians prefer 
>>hc3 for most types of data might save me from having to 
>>scramble through the statistical literature to try to figure 
>>out which of these I should be using.  A few sentences on 
>>what the differences are between these methods would be even 
>>better.  Then, there's the problem of output.  Given that hc1 
>>or hc3 are preferred for non-clustered data, I'd need to be 
>>able to get regression output of the form summary(lm) out of 
>>hccm, for any practical use.  Getting this, however, would 
>>require programming my own function.  Huberized t-stats for 
>>regressions are commonplace needs, an R oriented a little 
>>toward more everyday needs would not require programming of 
>>such needs.  Also, I'm not sure yet how well any of the 
>>existing functions handle missing data.
>>
> 
> 
> I think that we have a philosophical difference here: I don't like giving
> advice in documentation. An egregious extended example of this, in my
> opinion, is the SPSS documentation. The hccm() function uses hc3 as the
> default, which is an implicit recommendation, but more usefully, in my view,
> points to Long and Erwin's American Statistician paper on the subject, which
> does give advice and which is quite accessible. As well, and more generally,
> the car package is associated with a book (my R and S-PLUS Companion to
> Applied Regression), which gives advice, though, admittedly, tersely in this
> case.
> 
> The Anova() function with argument white=TRUE will give you F-tests
> corresponding to the t-tests to which you refer (though it will combine df
> for multiple-df terms in the model). To get the kind of summary you
> describe, you could use something like
> 
> mysummary <- function(model){
> 	coef <- coef(model)	
> 	se <- sqrt(diag(hccm(model)))
> 	t <- coef/se
> 	p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE)
> 	table <- cbind(coef, se, t, p)
> 	rownames(table) <- names(coef)
> 	colnames(table) <- c("Estimate", "Std. Error", "t value",
> "Pr(>|t|)")
> 	table
> 	}
> 
> Again, it's not time-consuming to write simple functions like this for one's
> own use, and the ability to do so is a strength of R, in my view. 
> 
> I'm not sure what you mean about handling missing data: functions like
> hccm(), linear.hypothesis(), and Anova() start with a model object for which
> missing data have already been handled.
> 
> Regards,
>  John
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University