[R-SIG-Finance] Winsorization
Ajay Shah
ajayshah at mayin.org
Thu Sep 18 06:31:00 CEST 2008
On Thu, Sep 18, 2008 at 11:29:19AM +0800, ?????? wrote:
> Dear all,
> I am dealing with a data set with many outliers value. And it is said
> that a technique named winsorization or winsorising can
> reduce the influence of those extreme values. Did anyone use this skill
> before? And how to do it in S+ or R? Thank you.
Winsorisation is not a great idea. It is an adhoc procedure. Your test
statistics are all suspect if you have preprocessed the data in this
fashion.
If you can do robust regressions (e.g. use the R package `robust')
that is far better. Get on the r-sig-robust mailing list and start
learning! (At least, that's what I'm doing).
If you must do it, here's some code:
winsorise <- function(x, cutoff=0.01) {
stopifnot(length(x)>0, cutoff>0)
osd <- sd(x)
values <- quantile(x, p=c(cutoff,1-cutoff), na.rm=TRUE)
winsorised.left <- x<values[1]
winsorised.right <- x>values[2] # From here on, I start writing into x
x[winsorised.left] <- values[1]
x[winsorised.right] <- values[2]
list(winsorised=x,
values=values,
osd=osd, nsd=sd(x),
winsorised.left=winsorised.left, winsorised.right=winsorised.right)
}
--
Ajay Shah http://www.mayin.org/ajayshah
ajayshah at mayin.org http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.
More information about the R-SIG-Finance
mailing list