[R] Data frame manipulation by eliminating rows containing extreme values

aajit75 aajit75 at yahoo.co.in
Sat Oct 22 12:57:02 CEST 2011


Dear All, 

I have got the limits for removing extreme values for each variables using
following function .

f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm =
TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)}

#Example:

n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + runif(n)/10
x4 <- x1 + x2 + x3 + runif(n)/10
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
x6 <- 1*(x5=='a' | x5=='c')
data1 <- cbind(x1,x2,x3,x4,x5,x6)
data2 <- data.frame(data1)
xyz <- lapply(data1, f)

#Now, I can eliminate those rows(observations) from the data which contains
extreme values for each of the variables one by one as below.

data2 <- subset (data2, x1<=xyz$x1[,1] &  x1>=xyz$x1[,2])
data2 <- subset (data2, x1<=xyz$x2[,1] &  x1>=xyz$x2[,2])

.
.
and so on..

But my data has more number of variables (more than 120),  can any body
suggest efficient way of eliminating rows containg extreme values?

Thanks in advance!

Regards,
-Ajit


--
View this message in context: http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3927941.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list