[R] Wald tests and Huberized variances (was: A comment about R:)

Thu Jan 5 16:41:27 CET 2006

On Wed, 4 Jan 2006, Peter Muhlberger wrote:

One comment in advance: please use a more meaningful subject. I would have
missed this mail if a colleague hadn't pointed me to it.

> I'm someone who from time to time comes to R to do applied stats for social
> science research.
[snip]
> I would also prefer not to have to work through a
> couple books on R or S+ to learn how to meet common needs in R.  If R were

There are some overviews and pointers available for certain topics,
so-called CRAN task views:
  http://CRAN.R-project.org/src/contrib/Views/
Currently, there is not yet a "SocialSciences" view (but John Fox is
working on one). However, it might as be interesting for you to look
at the "Econometrics" view which has some remarks about Wald tests.

> Ex. 1)  Wald tests of linear hypotheses after max. likelihood or even after
> a regression.  "Wald" does not even appear in my standard R package on a
> search.

You might want to look at waldtest() and coeftest() in package lmtest. And
you seem to have discovered linear.hypothesis() in package car. All three
perform Wald tests, providing different means of specifying the
hypothesis/alternative of the tests.

> There's no comment in the lm help or optim help about what function
> to use for hypothesis tests.

Well, the lm() man page does say:
  The functions 'summary' and 'anova' are used to obtain and print a
  summary and analysis of variance table of the results.

As for optim() it is not that straightforward, because optim() does not
know whether it maximizes a proper likelihood or not.

> I know that statisticians prefer likelihood
> ratio tests, but Wald tests are still useful and indeed crucial for
> first-pass analysis.  After searching with Google for some time, I found
> several Wald functions in various contributed R packages I did not have
> installed.  One confusion was which one would be relevant to my needs.  This
> took some time to resolve.

Yes, this is a problem that is at least partly addressed by the CRAN task
views.

> I concluded, perhaps on insufficient evidence,
> that package car's Wald test would be most helpful.  To use it, however, one
> has to put together a matrix for the hypotheses, which can be arduous for a
> many-term regression or a complex hypothesis.  In comparison, in Stata one
> simply states the hypothesis in symbolic terms.

waldtest() does the latter and is linked in the "See Also" section of
linear.hypothesis()

> I also don't know for
> certain that this function in car will work or work properly w/ various
> kinds of output, say from lm or from optim.

The man page of linear.hypothesis() does say that there are methods for
"lm" and "glm" objects (but not for results from optim).

> Ex. 2) Getting neat output of a regression with Huberized variance matrix.
> I frequently have to run regressions w/ robust variances.  In Stata, one
> simply adds the word "robust" to the end of the command or
> "cluster(cluster.variable)" for a cluster-robust error.  In R, there are two
> functions, robcov and hccm.  I had to run tests to figure out what the
> relationship is between them and between them and Stata (robcov w/o cluster
> gives hccm's hc0; hccm's hc1 is equivalent to Stata's 'robust' w/o cluster;
> etc.).

This is rather clearly document on the respective man pages. hccm()
provides HC covariance matrices without clustering, as does vcovHC() in
package sandwich. I plan to extend vcovHC() to also deal with clustered
data, but I didn't get round to do so, yet.

> A single sentence in hccm's help saying something to the effect that
> statisticians prefer hc3 for most types of data might save me from having to
> scramble through the statistical literature to try to figure out which of
> these I should be using.  A few sentences on what the differences are
> between these methods would be even better.

Yes and no. I'll add some more comments about the different HC-type
covariance matrices, but on the other hand this is just the software which
cannot replace understanding the underlying theory.

> Then, there's the problem of
> output.  Given that hc1 or hc3 are preferred for non-clustered data, I'd
> need to be able to get regression output of the form summary(lm) out of
> hccm, for any practical use.  Getting this, however, would require
> programming my own function.

Or using coeftest() from package lmtest intended particularly for this.

> Huberized t-stats for regressions are
> commonplace needs, an R oriented a little toward more everyday needs would
> not require programming of such needs.  Also, I'm not sure yet how well any
> of the existing functions handle missing data.

When fitting a linear model via lm() you can specify a suitable na.action.

The released version of lmtest and sandwich can deal with Wald tests and
sandwich covariance matrix estimators for linear models. I've got
development versions ready which make the functions fully object-oriented
and thus applicable to "glm" or "survreg" objects (for censored/tobit
regression) as well. I plan to release these soon, contact me if you want
to have a devel snapshot.

Best wishes,
Z