[R-SIG-Finance] Winsorization

Ajay Shah ajayshah at mayin.org
Thu Sep 18 06:31:00 CEST 2008


On Thu, Sep 18, 2008 at 11:29:19AM +0800, ?????? wrote:
> Dear all,
>        I am dealing with a data set with many outliers value. And it is said
> that a technique named winsorization or winsorising can
> reduce the influence of those extreme values. Did anyone use this skill
> before? And how to do it in S+ or R? Thank you.

Winsorisation is not a great idea. It is an adhoc procedure. Your test
statistics are all suspect if you have preprocessed the data in this
fashion.

If you can do robust regressions (e.g. use the R package `robust')
that is far better. Get on the r-sig-robust mailing list and start
learning! (At least, that's what I'm doing).

If you must do it, here's some code:

winsorise <- function(x, cutoff=0.01) {
  stopifnot(length(x)>0, cutoff>0)
  osd <-  sd(x)
  values <- quantile(x, p=c(cutoff,1-cutoff), na.rm=TRUE)
  winsorised.left <- x<values[1]
  winsorised.right <- x>values[2]       # From here on, I start writing into x
  x[winsorised.left] <- values[1]
  x[winsorised.right] <- values[2]
  list(winsorised=x,
       values=values,
       osd=osd, nsd=sd(x),
       winsorised.left=winsorised.left, winsorised.right=winsorised.right)
}

-- 
Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah at mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.



More information about the R-SIG-Finance mailing list