[R] Capping outliers

Jim Holtman jholtman at gmail.com
Tue Nov 22 12:24:06 CET 2011


You can easily vectorize this code using pmin/pmax.

Sent from my iPad

On Nov 22, 2011, at 1:06, Aher <ajit.aher at cedar-consulting.com> wrote:

> Hi Experts,
> 
> I am new to R, using following sample code for capping outliers using
> percentile information.  Working on large data (30000 observations and 150
> variables), loop I am using in the below mentioned code for detecting
> outliers and capping to upper /lower percentile value is taking much time
> for the execution.
> Is there anything wrong with code, can anyone suggest improvement in the
> script to enhance performance!
> min_pctle_cut <- 0.01
> max_pctle_cut <- 0.99
> library(outliers)
> 
> n <- 100
> x1 <- runif(n) 
> x2 <- runif(n) 
> x3 <- x1 + x2 + runif(n)/10 
> x4 <- x1 + x2 + x3 + runif(n)/10 
> x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) 
> x6 <- factor(1*(x5=='a' | x5=='c')) 
> data1 <- cbind(x1,x2,x3,x4,x5,x6) 
> x <- data.frame(data1) 
> 
> z <- x[,sapply(x,is.numeric)]
> 
> qs <- sapply(z, function(z) quantile(z,
>    c(min_pctle_cut, max_pctle_cut), na.rm = TRUE)) 
> 
> 
> #Loop below taking time for execution
> 
> system.time(for (i in 1:ncol(z))
> {
>    for (j in 1:nrow(z)) 
> {
> if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i]
> if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i] 
> } 
> })
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Capping-outliers-tp4094647p4094647.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list