[R] [RsR] p-values for robust regression
matias at stat.ubc.ca
Wed Jul 5 19:19:10 CEST 2006
I would only add a few comments to Martin's explanation below:
- tau-tests were primarily meant for general nested hypotheses, whereas
for a hypothesis of the form "beta_j = 0" for a single index "j" one can
also use (as it's done for glm estimators, say) *approximate* p-values
based on a normal approximation to the distribution of the ratio
"estimator / standard error" -- these are the p-values that
"summary.lmrob" currently reports, based on the *robust standard errors*
of Croux et al. 2003 (full reference in Martin's e-mail below) that
remain valid even when the data may contain asymmetric outliers;
- the asymptotic distribution of tau-tests is known for symmetrically
distributed errors and, furthermore, it involves a weighted sum of
independent chi-squared distributions, with weights depending on the
eigenvalues of the (asymptotic) covariance matrix of the explanatory
variables. Not surprisingly, their p-values are rather difficult to
calculate in practice (although approximations do exist: see Alfio
Marazzi's ROBETH and S-PLUS's robust libraries);
- for nested linear hypotheses, the tests in Markatou and Hettmansperger
(1990, "Robust bounded-influence tests in linear models", JASA, 85,
187-190) provide an alternative to the tau-tests with the "usual"
asymptotic chi-squared distribution, although this asymptotic
approximation is also known to hold for symmetrically distributed
errors, and moreover, seems to be rather sensitive to the presence of
outliers (see my paper in JSPI, 2005, 128, 241-257), while the Robust
Bootstrap performs quite well in estimating the p-values for these
- the standard errors and p-values for individual hypotheses of the form
"beta_j=0" reported by summary.lmrob (in robustbase) are (robust)
asymptotic approximations, which should be interpreted and used accordingly;
- if you're interested in nested linear hypotheses, there are some
proposals in the literature to obtain robust p-values for robust tests
although they have not been implemented in robustbase yet (hopefully
they will be in the near future).
Matias Salibian-Barrera - Department of Statistics
University of British Columbia - matias at stat.ubc.ca
Phone: (604) 822-3410 - Fax: (604) 822-6960
Martin Maechler wrote:
> [Oops! Written 6 hours ago, the following was accidentally not sent.]
>>>>>> "Celso" == Celso Barros <celso.barros at gmail.com>
>>>>>> on Wed, 5 Jul 2006 04:09:17 -0300 writes:
> Celso> When I run rlm to obtain robust standard errors, my output does not include
> Celso> p-values. Is there any reason p-values should not be used in this case?
> yes (see also below).
> Celso> Is there an argument I could use in rlm so that the output does
> Celso> include p-values?
> What are the reasons?
> How to properly do hypothesis testing in the context of robust
> regression has partly been an open research problem. Whereas
> or has been solved in Elvezio Ronchetti's PhD thesis (1982)
> by tau-tests, see chapter 7 of Hampel, Rousseeuw, Ronchetti,
> Stahel (1986), these are not (directly) related to standard
> errors, and t-tests with some degrees of freedom.
> Hence they are not so intuitively explainable, and not entirely
> trivial to implement. Probably this is one of reasons, why they
> (tau-tests) haven't been programmed for MASS (the book and the R package).
> Recent research, namely,
> Croux, C., Dhaene, G. and Hoorelbeke, D. (2003) _Robust standard
> errors for robust estimators_, Discussion Papers Series 03.16,
> K.U. Leuven, CES.
> has been made use of by Matias Salibian-Barrera's roblm()
> function now available as lmrob() from package 'robustbase'.
> There, mod <- lmrob(........); summary( mod )
> does provide you with P-values.
> But we still recommend *not* to ``believe in the P-values''
> blindly, but rather base your data analysis on serious analysis
> of residuals and other model checking.
> R-SIG-Robust at r-project.org mailing list
More information about the R-help