[R-SIG-Finance] Winsorization

Thu Sep 18 12:00:57 CEST 2008

I disagree with Ajay about the value of Winsorization.
Yes, it is ad hoc but it is simple to understand and
often results in reasonable answers.

It certainly depends on the context but if we are talking
about financial returns, then I haven't had positive
experience with traditional statistical robustness. 
(Given that my thesis was on robustness, I don't say
this lightly.)  Robustness often gives inferior answers
in finance (in my experience) even when it is obvious
that it "should" be the proper thing to do.  This is
a phenomenon that I don't understand.

The code that Ajay gives always truncates some fraction
of data in each tail.  Often Winsorization is thought of as
truncating only data that are too far from the center.  A
simple version of this is:

function(x, winsorize=5)
{
    s <- mad(x) * winsorize
    top <- median(x) +  s
    bot <- median(x) -  s
    x[x > top] <- top
    x[x < bot] <- bot
    x
}

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Ajay Shah wrote:
> On Thu, Sep 18, 2008 at 11:29:19AM +0800, ?????? wrote:
>   
>> Dear all,
>>        I am dealing with a data set with many outliers value. And it is said
>> that a technique named winsorization or winsorising can
>> reduce the influence of those extreme values. Did anyone use this skill
>> before? And how to do it in S+ or R? Thank you.
>>     
>
> Winsorisation is not a great idea. It is an adhoc procedure. Your test
> statistics are all suspect if you have preprocessed the data in this
> fashion.
>
> If you can do robust regressions (e.g. use the R package `robust')
> that is far better. Get on the r-sig-robust mailing list and start
> learning! (At least, that's what I'm doing).
>
> If you must do it, here's some code:
>
> winsorise <- function(x, cutoff=0.01) {
>   stopifnot(length(x)>0, cutoff>0)
>   osd <-  sd(x)
>   values <- quantile(x, p=c(cutoff,1-cutoff), na.rm=TRUE)
>   winsorised.left <- x<values[1]
>   winsorised.right <- x>values[2]       # From here on, I start writing into x
>   x[winsorised.left] <- values[1]
>   x[winsorised.right] <- values[2]
>   list(winsorised=x,
>        values=values,
>        osd=osd, nsd=sd(x),
>        winsorised.left=winsorised.left, winsorised.right=winsorised.right)
> }
>
>