[RsR] [R] "hubers" function in R MASS library : problem andsolution

Mon Feb 7 16:58:20 CET 2011

Hi Marting,� Maheswaran:

Thank you very much for your input!� 

I am applying M-estimation in an automatic data processing procedure (that requires both robust mean and robust SD).� The "mad=0" problem happens rarely.� Most of the time "hubers" works great.� 

I appreciate your proposed solutions. 

If we use:

> hubers(a, k = 1.5, s=mean(abs(a-median(a))))

Then "s" is fixed and seems less robust:� if there are outliers, "s=mean(abs(a-median(a)))" will appear too large, right? � 

Similaryly, using the "robustbase package, 

>� huberM(a, s = max(mad(a), 1e-6)) 

"s" will be fixed as well and will be less efficient. 

Regarding Martin's comment: 

"but 'tol' is completely arbitrary and, the way you propose it makes the resulting estimate no-longer-scale-equivariant.",�� 

I did an experiment using different levels of "tol" and found them to give identical results.�� For convenience, I add a "s.min" parameter in "hubers" that bounds the initial "s" estimate from below:

� if (missing(s)) 
����� s0 <- max(mad(y), s.min)
� 
Then I tried the following settings:

> hubers(c(rep(1,10), 2), s.min=1e-6)
$mu
[1] 1

$s
[1] 7.958e-17

> hubers(c(rep(1,10), 2), s.min=1)
$mu
[1] 1

$s
[1] 7.958e-17

> hubers(c(rep(1,10), 2), s.min=10)
$mu
[1] 1

$s
[1] 7.958e-17

#####

> hubers(a, s.min=1e-6)
$mu
[1] 5.88

$s
[1] 2.937

> hubers(a, s.min=1)
$mu
[1] 5.88

$s
[1] 2.937

> hubers(a, s.min=10)
$mu
[1] 5.88

$s
[1] 2.937

However, forgive me for my ignorance about the practical implication of "scale-equivalent". 

Let me know if you have any comment.

Thanks again for your help!

Sincerely,
Feiming Chen

--- On Sun, 2/6/11, Maheswaran Rohan <mrohan using doc.govt.nz> wrote:

From: Maheswaran Rohan <mrohan using doc.govt.nz>
Subject: RE: [RsR] [R] "hubers" function in R MASS library : problem andsolution
To: "Martin Maechler" <maechler using stat.math.ethz.ch>, "Feiming Chen" <feimingchen using yahoo.com>
Cc: r-sig-robust using r-project.org
Date: Sunday, February 6, 2011, 3:23 PM

Hi Chen

>� With this change,� the result is more sensible:
� � >> hubers(a)
� � > $mu
� � > [1] 5.88
� � > $s
� � > [1] 2.937

This mu = 5.88 is very similar to the mean(a) = 5.877417. That means, no
point to apply M-estimation if you are happy with the result.

Following suggestion may be helpful, think about it!

To overcome mad = 0 problem, try to define s = mean(abs((a - median(a)))

> hubers(a, k = 1.5, s=mean(abs(a-median(a))))
$mu
[1] 6.414677

$s
[1] 1.689561

> hubers(a, k = 1.345, s=mean(abs(a-median(a))))
$mu
[1] 6.480148

$s
[1] 1.689561

Regards
Rohan

-----Original Message-----
From: r-sig-robust-bounces using r-project.org
[mailto:r-sig-robust-bounces using r-project.org] On Behalf Of Martin Maechler
Sent: Friday, 4 February 2011 10:05 p.m.
To: Feiming Chen
Cc: r-help using r-project.org; r-sig-robust using r-project.org
Subject: Re: [RsR] [R] "hubers" function in R MASS library : problem
andsolution

>>>>> Feiming Chen <feimingchen using yahoo.com>
>>>>>� ���on Thu, 3 Feb 2011 12:03:05 -0800 (PST) writes:

� � > Hello:
� � > I found the "hubers" function in MASS library is NOT working on
the following 
� � > data:

� � >> a <- 
� � >>
c(7.19,7.19,7.19,9.41,6.79,9.41,7.19,9.41,1.64,7.19,7.19,7.19,7.19,1.422
,7.19,6.79,7.19,6.79,7.19,7.19,4.44,6.55,6.79,7.19,9.41,9.41,7.19,7.19,7
.19,7.19,1.64,1.597,1.64,7.19,1.422,7.19,6.79,9.38,7.19,1.64,7.19,7.19,7
.19,7.19,7.19,1.64,7.19,6.79,6.79,1.649,1.64,7.19,1.597,1.64,6.55,7.19,7
.19,1.64,7.19,7.19,1.407,1.672,1.672,7.19,6.55,7.19,7.19,9.41,1.407,7.19
,7.19,9.41,7.19,9.41,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,9.41,7
.19,6.79,7.19,6.79,1.64,1.422,7.19,7.19,1.67,1.64,1.64,1.64,1.64,1.787,7
.19,7.19,7.19,6.79,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,1.64,1.6
4,1.64,1.422,9.41,1.64,7.19,7.19,7.19,7.19,7.19,7.19,7.19,6.79,6.79,7.19
,6.79,7.19,7.19,1.407,7.19,4.42,9,1.64,1.64,6.79,1.664,1.664)
� � >> 

� � >> library(MASS)
� � >> hubers(a)
� � > ## NO response!

� � > I think it is due to the infinite loop caused by the following
line in the code 
� � > of "hubers" (around Line 30):

� � > if ((abs(mu0 - mu1) < tol * s0) && 
� � > abs(s0 - s1) < tol * s0)� break

� � > where "s0" evaluates to ZERO initially (due to more than 50% of
the number 
� � > 7.19). 

yes.

Not only for this reason,� the robustbase� package 
has had the� 'huberM()' function with some other slight
advantages over MASS::huber.

� � > I propose to change the "<" sign to "<=":

� � > if ((abs(mu0 - mu1) <= tol * s0) && 
� � > abs(s0 - s1) <= tol * s0)� break

� � > which will break the infinite loop.� � However, the new result is:

� � >> hubers(a)
� � > $mu
� � > [1] 7.19

� � > $s
� � > [1] 0

� � > which gives 0 standard deviation.� Actually the data does vary and
it is not 
� � > true all values other than 7.19 are outliers.���

Sure. Nontheless, the way Peter Huber had defined the "proposal
2" Huber estimator,� s = 0, is the correct result.

With the robustbase� huberM() function, you (can) get

> huberM(a, warn0scale =TRUE)
$mu
[1] 7.19

$s
[1] 0

$it
[1] 0

Warning message:
In huberM(a, warn0scale = TRUE) :
� scale 's' is zero -- returning initial 'mu'

� � >> plot(a)

� � > I think this is because we allow initial SD to equal to zero
instead of a 
� � > POSITIVE value.���See Line 15 of the "hubers" function:

� � > if (missing(s))
� � > s0 <- mad(y)

� � > I propose setting "s0" to "mad(y)" or a small positive number,
whichever is 
� � > larger.� For example:

� � > if (missing(s))
� � > s0 <- max(mad(y), tol)

� � > where tol=1e-6.

but 'tol' is completely arbitrary and, the way you propose it
makes the resulting estimate 
no-longer-scale-equivariant.

huberM() *has* an� s� argument for specifying the scale estimate,
so you could use it as

� huberM(a, s = max(mad(a), 1e-6))� 

if you want.

Note that your sample 'a' is constructed in a way that all
scale-equivariant 50%-breakpoint robust estimates of scale will return s
= 0,
as more than half of your observations are identical,
and scale equivariance "ensures" that in this limiting case, indeed all
other observations are "outliers".

This last point is a somewhat interesting topic for
"robustniks",
and hence I'm CC'ing the dedicated "R + Robustness" mailing
list, R-SIG-robust.

Martin Maechler, ETH Zurich

� � >� With this change,� the result is more sensible:
� � >> hubers(a)
� � > $mu
� � > [1] 5.88
� � > $s
� � > [1] 2.937

� � > Could anyone take a look at this and decide if the above
modifications to the 
� � > "hubers" function are warranted?Thanks a lot! 

� � > Sincerely, 
� � > Feiming Chen

� � > Read more >>���Options >>���
� � > [[alternative HTML version deleted]]

� � > ______________________________________________
� � > R-help using r-project.org mailing list
� � > https://stat.ethz.ch/mailman/listinfo/r-help
� � > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
� � > and provide commented, minimal, self-contained, reproducible code.

_______________________________________________
R-SIG-Robust using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust
##############################################
This e-mail (and attachments) is confidential
and may be legally privileged.
##############################################

	[[alternative HTML version deleted]]