[R] Winsorizing Multiple Variables
Michael Conklin
michael.conklin at markettools.com
Fri Jan 16 22:24:36 CET 2009
Don't sort y. Calculate xbot and xtop using
xtemp<-quantile(y,c(tr,1-tr),na.rm=na.rm)
xbot<-xtemp[1]
xtop<-xtemp[2]
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Karl Healey
Sent: Friday, January 16, 2009 2:51 PM
To: r-help at r-project.org
Subject: [R] Winsorizing Multiple Variables
Hi All,
I want to take a matrix (or data frame) and winsorize each variable.
So I can, for example, correlate the winsorized variables.
The code below will winsorize a single vector, but when applied to
several vectors, each ends up sorted independently in ascending order
so that a given observation is no longer on the same row for each
vector.
So I need to winsorize the variable but then return it to its original
order. Or another solution that will take a data frame, wisorize each
variable, and return a new data frame with all the variables in the
original order.
Thanks for any help!
-Karl
#The function I'm working from
win<-function(x,tr=.2,na.rm=F){
if(na.rm)x<-x[!is.na(x)]
y<-sort(x)
n<-length(x)
ibot<-floor(tr*n)+1
itop<-length(x)-ibot+1
xbot<-y[ibot]
xtop<-y[itop]
y<-ifelse(y<=xbot,xbot,y)
y<-ifelse(y>=xtop,xtop,y)
win<-y
win
}
#Produces an example data frame, ss is the observation id, vars 1-5
are the variables I want to winzorise.
ss
=
c
(1
:
5
);var1
=
rnorm
(5
);var2
=
rnorm
(5
);var3
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
>data
data
#Winsorizes each variable, but sorts them independently so the
observations no longer line up.
sapply(data,win)
___________________________
M. Karl Healey
Ph.D. Student
Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3
karl at psych.utoronto.ca
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list