[R] Winsorizing Multiple Variables

William Revelle lists at revelle.net
Sat Jan 17 01:41:07 CET 2009


Thanks to Michael for giving a nice solution to Karl's question .

This identified a bug in the psych package winsor function which has 
now been fixed in version 1.0.63.  (The current development version). 
Although my winsor.means function  in 1.0..62 (and ealier) worked 
correctly, my winsor function when applied to matrices or data.frames 
gave an incorrect result.

Bill





At 1:24 PM -0800 1/16/09, Michael Conklin wrote:
>Don't sort y. Calculate xbot and xtop using
>xtemp<-quantile(y,c(tr,1-tr),na.rm=na.rm)
>xbot<-xtemp[1]
>xtop<-xtemp[2]
>
>-----Original Message-----
>From: r-help-bounces at r-project.org 
>[mailto:r-help-bounces at r-project.org] On Behalf Of Karl Healey
>Sent: Friday, January 16, 2009 2:51 PM
>To: r-help at r-project.org
>Subject: [R] Winsorizing Multiple Variables
>
>Hi All,
>
>I want to take a matrix (or data frame) and winsorize each variable.
>So I can, for example, correlate the winsorized variables.
>
>The code below will winsorize a single vector, but when applied to
>several vectors, each ends up sorted independently in ascending order
>so that a given observation is no longer on the same row for each
>vector.
>
>So I need to winsorize the variable but then return it to its original
>order. Or another solution that will take a data frame, wisorize each
>variable, and return a new data frame with all the variables in the
>original order.
>
>Thanks for any help!
>
>-Karl
>
>
>#The function I'm working from
>
>win<-function(x,tr=.2,na.rm=F){
>
>     if(na.rm)x<-x[!is.na(x)]
>     y<-sort(x)
>     n<-length(x)
>     ibot<-floor(tr*n)+1
>     itop<-length(x)-ibot+1
>     xbot<-y[ibot]
>     xtop<-y[itop]
>     y<-ifelse(y<=xbot,xbot,y)
>     y<-ifelse(y>=xtop,xtop,y)
>     win<-y
>     win
>}
>
>#Produces an example data frame, ss is the observation id, vars 1-5
>are the variables I want to winzorise.
>
>ss
>=
>c
>(1
>:
>5
>);var1
>=
>rnorm
>(5
>);var2
>=
>rnorm
>(5
>);var3
>=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))-
>  >data
>data
>
>#Winsorizes each variable, but sorts them independently so the
>observations no longer line up.
>
>sapply(data,win)
>
>
>___________________________
>M. Karl Healey
>Ph.D. Student
>
>Department of Psychology
>University of Toronto
>Sidney Smith Hall
>100 St. George Street
>Toronto, ON
>M5S 3G3
>
>karl at psych.utoronto.ca
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
William Revelle		http://personality-project.org/revelle.html
Professor			http://personality-project.org/personality.html
Department of Psychology             http://www.wcas.northwestern.edu/psych/
Northwestern University	http://www.northwestern.edu/
Attend  ISSID/ARP:2009               http://issid.org/issid.2009/



More information about the R-help mailing list