[RsR] [R] p-values for robust regression

Matias Salibian-Barrera m@t|@@ @end|ng |rom @t@t@ubc@c@
Wed Jul 5 19:19:10 CEST 2006


Dear Celso,

I would only add a few comments to Martin's explanation below:

- tau-tests were primarily meant for general nested hypotheses, whereas 
for a hypothesis of the form "beta_j = 0" for a single index "j" one can 
also use (as it's done for glm estimators, say) *approximate* p-values 
based on a normal approximation to the distribution of the ratio 
"estimator / standard error" -- these are the p-values that 
"summary.lmrob" currently reports, based on the *robust standard errors* 
of Croux et al. 2003 (full reference in Martin's e-mail below) that 
remain valid even when the data may contain asymmetric outliers;

- the asymptotic distribution of tau-tests is known for symmetrically 
distributed errors and, furthermore, it involves a weighted sum of 
independent chi-squared distributions, with weights depending on the 
eigenvalues of the (asymptotic) covariance matrix of the explanatory 
variables. Not surprisingly, their p-values are rather difficult to 
calculate in practice (although approximations do exist: see Alfio 
Marazzi's ROBETH and S-PLUS's robust libraries);

- for nested linear hypotheses, the tests in Markatou and Hettmansperger 
(1990, "Robust bounded-influence tests in linear models", JASA, 85, 
187-190) provide an alternative to the tau-tests with the "usual" 
asymptotic chi-squared distribution, although this asymptotic 
approximation is also known to hold for symmetrically distributed 
errors, and moreover, seems to be rather sensitive to the presence of 
outliers (see my paper in JSPI, 2005, 128, 241-257), while the Robust 
Bootstrap performs quite well in estimating the p-values for these 
robust tests.


Summarizing:

- the standard errors and p-values for individual hypotheses of the form 
"beta_j=0" reported by summary.lmrob (in robustbase) are (robust) 
asymptotic approximations, which should be interpreted and used accordingly;
- if you're interested in nested linear hypotheses, there are some 
proposals in the literature to obtain robust p-values for robust tests 
although they have not been implemented in robustbase yet (hopefully 
they will be in the near future).

Best,

Matias

--
______________________________________________________________
Matias Salibian-Barrera - Department of Statistics
University of British Columbia - matias using stat.ubc.ca
Phone: (604) 822-3410 - Fax: (604) 822-6960


Martin Maechler wrote:
>   [Oops! Written 6 hours ago, the following was accidentally not sent.]
> 
>>>>>> "Celso" == Celso Barros <celso.barros using gmail.com>
>>>>>>     on Wed, 5 Jul 2006 04:09:17 -0300 writes:
> 
>     Celso> When I run rlm to obtain robust standard errors, my output does not include
>     Celso> p-values. Is there any reason p-values should not be used in this case? 
> 
> yes (see also below).
> 
>     Celso> Is there an argument I could use in rlm so that the output does
>     Celso> include p-values?
> no.
> 
> What are the reasons?
> 
> How to properly do hypothesis testing in the context of robust
> regression has partly been an open research problem.  Whereas
> or has been solved in Elvezio Ronchetti's PhD thesis (1982)
> by tau-tests, see chapter 7 of  Hampel, Rousseeuw, Ronchetti,
> Stahel (1986), these are not (directly) related to standard
> errors, and t-tests with some degrees of freedom.
> Hence they are not so intuitively explainable, and not entirely
> trivial to implement.  Probably this is one of reasons, why they
> (tau-tests) haven't been programmed for MASS (the book and the R package).
> 
> Recent research, namely,
>      Croux, C., Dhaene, G. and Hoorelbeke, D. (2003) _Robust standard
>      errors for robust estimators_, Discussion Papers Series 03.16,
>      K.U. Leuven, CES.
> has been made use of by Matias Salibian-Barrera's roblm()
> function now available as  lmrob() from package 'robustbase'.
> There,  mod <- lmrob(........);  summary( mod ) 
> does provide you with P-values.
> But we still recommend *not* to ``believe in the P-values''
> blindly, but rather base your data analysis on serious analysis
> of residuals and other model checking.
> 
> _______________________________________________
> R-SIG-Robust using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
> 
> 


--




More information about the R-SIG-Robust mailing list