[R] remove extreme values or winsorize loop - dataframe
Cecilia Carmo
cecilia.carmo at ua.pt
Sun Aug 1 03:39:21 CEST 2010
Hi everyone!
#I need a loop or a function that creates a X2 variable
that is X1 without the extreme values (or X1 winsorized)
by industry and year.
#My reproducible example:
firm<-sort(rep(1:1000,10),decreasing=F)
year<-rep(1998:2007,1000)
industry<-rep(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),rep(8,10),rep(9,10),
rep(10,10)),1000)
X1<-rnorm(10000)
data<-data.frame(firm, industry,year,X1)
data
The way Im doing this is very hard. I split my sample by
industry and year, for each industry and year I calculate
the 10% and 90% quantiles, then I create a X2 variable
like this:
industry1<-subset(data,data$industry==1)
ind1year1999<-subset(industry1,industry1$year==1999)
q1<-quantile(ind1year1999$X1,probs=0.1,na.rm=TRUE)
q99<-quantile(ind1year1999$X1,probs=0.90,na.rm=TRUE)
ind1year1999winsorized<-transform(ind1year1999,X2=ifelse(X1<q1,q1,ifelse(X1>q99,q99,X1)))
ind1year2000<-subset(industry1,industry1$year==2000)
q1<-quantile(ind1year2000$X1,probs=0.1,na.rm=TRUE)
q99<-quantile(ind1year2000$X1,probs=0.90,na.rm=TRUE)
ind1year2000winsorized<-transform(ind1year2000,X2=ifelse(X1<q1,q1,ifelse(X1>q99,q99,X1)))
I repeat this for all years and industries, and then I
merge/bind all again to have a new dataframe with all the
columns of the dataframe «data» plus X2.
Could anyone help me doing this in a easier way?
Thanks
Cecília Carmo
Universidade de Aveiro - Portugal
More information about the R-help
mailing list