[R] "hubers" function in R MASS library : problem and solution

Martin Maechler maechler at stat.math.ethz.ch
Fri Feb 4 10:05:22 CET 2011


>>>>> Feiming Chen <feimingchen at yahoo.com>
>>>>>     on Thu, 3 Feb 2011 12:03:05 -0800 (PST) writes:

    > Hello:
    > I found the "hubers" function in MASS library is NOT working on the following 
    > data:

    >> a <- 
    >> c(7.19,7.19,7.19,9.41,6.79,9.41,7.19,9.41,1.64,7.19,7.19,7.19,7.19,1.422,7.19,6.79,7.19,6.79,7.19,7.19,4.44,6.55,6.79,7.19,9.41,9.41,7.19,7.19,7.19,7.19,1.64,1.597,1.64,7.19,1.422,7.19,6.79,9.38,7.19,1.64,7.19,7.19,7.19,7.19,7.19,1.64,7.19,6.79,6.79,1.649,1.64,7.19,1.597,1.64,6.55,7.19,7.19,1.64,7.19,7.19,1.407,1.672,1.672,7.19,6.55,7.19,7.19,9.41,1.407,7.19,7.19,9.41,7.19,9.41,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,9.41,7.19,6.79,7.19,6.79,1.64,1.422,7.19,7.19,1.67,1.64,1.64,1.64,1.64,1.787,7.19,7.19,7.19,6.79,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,7.19,1.64,1.64,1.64,1.422,9.41,1.64,7.19,7.19,7.19,7.19,7.19,7.19,7.19,6.79,6.79,7.19,6.79,7.19,7.19,1.407,7.19,4.42,9,1.64,1.64,6.79,1.664,1.664)
    >> 

    >> library(MASS)
    >> hubers(a)
    > ## NO response!

    > I think it is due to the infinite loop caused by the following line in the code 
    > of "hubers" (around Line 30):

    > if ((abs(mu0 - mu1) < tol * s0) && 
    > abs(s0 - s1) < tol * s0)  break

    > where "s0" evaluates to ZERO initially (due to more than 50% of the number 
    > 7.19). 

yes.

Not only for this reason,  the robustbase  package 
has had the  'huberM()' function with some other slight
advantages over MASS::huber.




    > I propose to change the "<" sign to "<=":

    > if ((abs(mu0 - mu1) <= tol * s0) && 
    > abs(s0 - s1) <= tol * s0)  break

    > which will break the infinite loop.    However, the new result is:

    >> hubers(a)
    > $mu
    > [1] 7.19

    > $s
    > [1] 0

    > which gives 0 standard deviation.  Actually the data does vary and it is not 
    > true all values other than 7.19 are outliers.   

Sure. Nontheless, the way Peter Huber had defined the "proposal
2" Huber estimator,  s = 0, is the correct result.

With the robustbase  huberM() function, you (can) get

> huberM(a, warn0scale =TRUE)
$mu
[1] 7.19

$s
[1] 0

$it
[1] 0

Warning message:
In huberM(a, warn0scale = TRUE) :
  scale 's' is zero -- returning initial 'mu'


    >> plot(a)

    > I think this is because we allow initial SD to equal to zero instead of a 
    > POSITIVE value.   See Line 15 of the "hubers" function:

    > if (missing(s))
    > s0 <- mad(y)

    > I propose setting "s0" to "mad(y)" or a small positive number, whichever is 
    > larger.  For example:

    > if (missing(s))
    > s0 <- max(mad(y), tol)

    > where tol=1e-6.

but 'tol' is completely arbitrary and, the way you propose it
makes the resulting estimate 
no-longer-scale-equivariant.

huberM() *has* an  s  argument for specifying the scale estimate,
so you could use it as

  huberM(a, s = max(mad(a), 1e-6))  

if you want.

Note that your sample 'a' is constructed in a way that all
scale-equivariant 50%-breakpoint robust estimates of scale will return s = 0,
as more than half of your observations are identical,
and scale equivariance "ensures" that in this limiting case, indeed all
other observations are "outliers".

This last point is a somewhat interesting topic for
"robustniks",
and hence I'm CC'ing the dedicated "R + Robustness" mailing
list, R-SIG-robust.

Martin Maechler, ETH Zurich


    >  With this change,  the result is more sensible:
    >> hubers(a)
    > $mu
    > [1] 5.88
    > $s
    > [1] 2.937

    > Could anyone take a look at this and decide if the above modifications to the 
    > "hubers" function are warranted?Thanks a lot! 


    > Sincerely, 
    > Feiming Chen

    > Read more >>   Options >>   
    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-help at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list