[R] Advanced Filtering problem
hadley wickham
h.wickham at gmail.com
Fri Jun 20 01:49:12 CEST 2008
Hi Tyler,
> I've attached 100 rows of a data frame I am working with.
> I have one factor, id, with 27 levels. There are two columns of reference
> data, x and y (UTM coordinates), one column "date" in POSIXct format, and
> one column "diff" in times format (chron package).
>
> What I am trying to do is as follows:
> For each day of the year (date, irrespective of time), select that row for
> each id which contains the smallest "diff" value, resulting in an output
> containing in general one value per id per day.
There's a basic strategy that makes solving this type of problem much
easier. I call it split-apply-combine. The basic idea is that if you
had a single day, the problem would be pretty easy:
df <- read.csv("http://www.nabble.com/file/p18018170/subdata.csv")
oneday <- subset(df, day == "01-01-05")
oneday[which.min(oneday$diff), ]
# Let's make that into a function to make it easier to apply to all days
mindiff <- function(df) df[which.min(df$diff), ]
# Now we split up the data frame so that we have a data frame for
# each day
pieces <- split(df, df$day)
# And use lapply to apply that function to each piece:
results <- lapply(pieces, mindiff)
# Then finally join all the pieces back together
df_done <- do.call("rbind", results)
So we split the data frame into individual days, picked the correct
row for each day, and then joined all the pieces back together. This
isn't the most efficient solution, but I think it's easy to see how
each part works, and how you can apply it to new situations. If you
aren't familiar with lapply or do.call, it's worth having a look at
their examples to get a feel for how they work (although for this case
you can of course just copy and paste them without caring how they
work)
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list