[R] Drop values of one dataframe based on the value of another

Sam Albers tonightsthenight at gmail.com
Fri Jun 1 20:40:12 CEST 2012


Hello all,

Let me first say that this isn't a question about outliers. I am using
the outlier function from the outliers package but I am using it only
because it is a convenient wrapper to determine values that have the
largest difference between itself and the sample mean. Where I am
running into problems is that I am several groups where I want to
calculate the "outlier" within that group. Then I want to create two
data.frames, one with the "outliers" and the other those values
dropped. And both dataframes need to include additional columns of
data present before the subset. The first case is easy but I can't
seem to figure out how to determine the next. So for example:

library(plyr)
library(outliers)

## A dataframe with some obviously extreme values
dfa <- data.frame(Mins=runif(15, 0,1),
Fac=rep(c("Test1","Test2","Test3"), each=5))
df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3"))
df <- rbind(dfa, df.out)
df$Meta <- runif(18,4,5); df

## Dataframe with the extreme value
To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove

So now my question is how can I use this dataframe (To_remove) to
remove all these values from df and create a new dataframe. Given a df
(To_remove) with a list of values, how can I choose all values of
another dataframe (df) that aren't those values in the To_remove
dataframe?. There is a rm.outliers function in this same package but I
having trouble with that and would like to try another approach.

Thanks in advance!

Sam



More information about the R-help mailing list