[R] R newbie: logical subsets
Joshua Tokle
jtokle at math.washington.edu
Tue Jul 11 20:51:01 CEST 2006
Hello! I'm a newcomer to R hoping to replace some convoluted database
code with an R script. Unfortunately, I haven't been able to figure out
how to implement the following logic.
Essentially, we have a database of transactions that are coded with a
geographic locale and a type. These are being loaded into a data.frame
with named variables city, type, and price. E.g., trans$city and all
that.
We want to calculate mean prices by city and type, AFTER excluding
outliers. That is, we want to calculate the mean price in 3 steps:
1. calculate a mean and standard deviation by city and type over all
transactions
2. create a subset of the original data frame, excluding transactions that
differ from the relevant mean by more than 2 standard deviations
3. calculate a final mean by city and type based on this subset.
I'm stuck on step 2. I would like to do something like the following:
fs <- list(factor(trans$city), factor(trans$type))
means <- tapply(trans$price, fs, mean)
stdevs <- tapply(trans$price, fs, sd)
filter <- abs(trans$price - means[trans$city, trans$type]) <
2*stdevs[trans$city, trans$type]
sub <- subset(trans, filter)
The above code doesn't work. What's the correct way to do this?
Thanks,
Josh
More information about the R-help
mailing list