Ajay Shah ajayshah at mayin.org
Thu Sep 18 06:31:00 CEST 2008

On Thu, Sep 18, 2008 at 11:29:19AM +0800, ?????? wrote:
> Dear all,
>        I am dealing with a data set with many outliers value. And it is said
> that a technique named winsorization or winsorising can
> reduce the influence of those extreme values. Did anyone use this skill
> before? And how to do it in S+ or R? Thank you.

Winsorisation is not a great idea. It is an adhoc procedure. Your test
statistics are all suspect if you have preprocessed the data in this

If you can do robust regressions (e.g. use the R package `robust')
that is far better. Get on the r-sig-robust mailing list and start
learning! (At least, that's what I'm doing).

If you must do it, here's some code:

winsorise <- function(x, cutoff=0.01) {
  stopifnot(length(x)>0, cutoff>0)
  osd <-  sd(x)
  values <- quantile(x, p=c(cutoff,1-cutoff), na.rm=TRUE)
  winsorised.left <- x<values[1]
  winsorised.right <- x>values[2]       # From here on, I start writing into x
  x[winsorised.left] <- values[1]
  x[winsorised.right] <- values[2]
       osd=osd, nsd=sd(x),
       winsorised.left=winsorised.left, winsorised.right=winsorised.right)

