[RsR] [R] "hubers" function in R MASS library : problem andsolution
Maheswaran Rohan
mroh@n @end|ng |rom doc@govt@nz
Sun Feb 6 22:23:30 CET 2011
Hi Chen
> With this change, the result is more sensible:
>> hubers(a)
> $mu
> [1] 5.88
> $s
> [1] 2.937
This mu = 5.88 is very similar to the mean(a) = 5.877417. That means, no
point to apply M-estimation if you are happy with the result.
Following suggestion may be helpful, think about it!
To overcome mad = 0 problem, try to define s = mean(abs((a - median(a)))
> hubers(a, k = 1.5, s=mean(abs(a-median(a))))
$mu
[1] 6.414677
$s
[1] 1.689561
> hubers(a, k = 1.345, s=mean(abs(a-median(a))))
$mu
[1] 6.480148
$s
[1] 1.689561
Regards
Rohan
-----Original Message-----
From: r-sig-robust-bounces using r-project.org
[mailto:r-sig-robust-bounces using r-project.org] On Behalf Of Martin Maechler
Sent: Friday, 4 February 2011 10:05 p.m.
To: Feiming Chen
Cc: r-help using r-project.org; r-sig-robust using r-project.org
Subject: Re: [RsR] [R] "hubers" function in R MASS library : problem
andsolution
>>>>> Feiming Chen <feimingchen using yahoo.com>
>>>>> on Thu, 3 Feb 2011 12:03:05 -0800 (PST) writes:
> Hello:
> I found the "hubers" function in MASS library is NOT working on
the following
> data:
>> a <-
>>
c(7.19,7.19,7.19,9.41,6.79,9.41,7.19,9.41,1.64,7.19,7.19,7.19,7.19,1.422
,7.19,6.79,7.19,6.79,7.19,7.19,4.44,6.55,6.79,7.19,9.41,9.41,7.19,7.19,7
.19,7.19,1.64,1.597,1.64,7.19,1.422,7.19,6.79,9.38,7.19,1.64,7.19,7.19,7
.19,7.19,7.19,1.64,7.19,6.79,6.79,1.649,1.64,7.19,1.597,1.64,6.55,7.19,7
.19,1.64,7.19,7.19,1.407,1.672,1.672,7.19,6.55,7.19,7.19,9.41,1.407,7.19
,7.19,9.41,7.19,9.41,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,9.41,7
.19,6.79,7.19,6.79,1.64,1.422,7.19,7.19,1.67,1.64,1.64,1.64,1.64,1.787,7
.19,7.19,7.19,6.79,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,1.64,1.6
4,1.64,1.422,9.41,1.64,7.19,7.19,7.19,7.19,7.19,7.19,7.19,6.79,6.79,7.19
,6.79,7.19,7.19,1.407,7.19,4.42,9,1.64,1.64,6.79,1.664,1.664)
>>
>> library(MASS)
>> hubers(a)
> ## NO response!
> I think it is due to the infinite loop caused by the following
line in the code
> of "hubers" (around Line 30):
> if ((abs(mu0 - mu1) < tol * s0) &&
> abs(s0 - s1) < tol * s0) break
> where "s0" evaluates to ZERO initially (due to more than 50% of
the number
> 7.19).
yes.
Not only for this reason, the robustbase package
has had the 'huberM()' function with some other slight
advantages over MASS::huber.
> I propose to change the "<" sign to "<=":
> if ((abs(mu0 - mu1) <= tol * s0) &&
> abs(s0 - s1) <= tol * s0) break
> which will break the infinite loop. However, the new result is:
>> hubers(a)
> $mu
> [1] 7.19
> $s
> [1] 0
> which gives 0 standard deviation. Actually the data does vary and
it is not
> true all values other than 7.19 are outliers.
Sure. Nontheless, the way Peter Huber had defined the "proposal
2" Huber estimator, s = 0, is the correct result.
With the robustbase huberM() function, you (can) get
> huberM(a, warn0scale =TRUE)
$mu
[1] 7.19
$s
[1] 0
$it
[1] 0
Warning message:
In huberM(a, warn0scale = TRUE) :
scale 's' is zero -- returning initial 'mu'
>> plot(a)
> I think this is because we allow initial SD to equal to zero
instead of a
> POSITIVE value. See Line 15 of the "hubers" function:
> if (missing(s))
> s0 <- mad(y)
> I propose setting "s0" to "mad(y)" or a small positive number,
whichever is
> larger. For example:
> if (missing(s))
> s0 <- max(mad(y), tol)
> where tol=1e-6.
but 'tol' is completely arbitrary and, the way you propose it
makes the resulting estimate
no-longer-scale-equivariant.
huberM() *has* an s argument for specifying the scale estimate,
so you could use it as
huberM(a, s = max(mad(a), 1e-6))
if you want.
Note that your sample 'a' is constructed in a way that all
scale-equivariant 50%-breakpoint robust estimates of scale will return s
= 0,
as more than half of your observations are identical,
and scale equivariance "ensures" that in this limiting case, indeed all
other observations are "outliers".
This last point is a somewhat interesting topic for
"robustniks",
and hence I'm CC'ing the dedicated "R + Robustness" mailing
list, R-SIG-robust.
Martin Maechler, ETH Zurich
> With this change, the result is more sensible:
>> hubers(a)
> $mu
> [1] 5.88
> $s
> [1] 2.937
> Could anyone take a look at this and decide if the above
modifications to the
> "hubers" function are warranted?Thanks a lot!
> Sincerely,
> Feiming Chen
> Read more >> Options >>
> [[alternative HTML version deleted]]
> ______________________________________________
> R-help using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
_______________________________________________
R-SIG-Robust using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-robust
##############################################
This e-mail (and attachments) is confidential
and may be legally privileged.
##############################################
More information about the R-SIG-Robust
mailing list