[R] Subset by Factor by date

T.D.Rudolph prairie.picker at gmail.com
Sat Jun 14 07:25:54 CEST 2008


aggregate() is indeed a useful function in this case, but it only returns the
columns by which it was grouped.  Is there a way I can use this while
simultaneously retaining all the other column values in the dataframe? 

e.g. add superfluous (yet pertinent for later) column containing any
information at all and retain it in the final output


Marc Schwartz wrote:
> 
> on 06/13/2008 11:10 PM T.D.Rudolph wrote:
>> I have a dataframe, x, with over 60,000 rows that contains one Factor,
>> "id",
>> with 27 levels.  
>> The dataframe contains numerous continuous values (along column "diff")
>> per
>> day (column "date") for every level of id.  I would like to select only
>> one
>> row per animal per day, i.e. that containing the minimum value of "diff",
>> along the full length of 1:nrow(x).  I am not yet able to conduct
>> anything
>> beyond the simplest of functions and I was hoping someone could suggest
>> an
>> effective way of producing this output.
>> 
>> e.g. given this input:
>> 
>> id  day         diff
>> 1  01-01-09  0.5
>> 1  01-01-09  0.7
>> 2  01-01-09  0.2
>> 2  01-01-09  0.4
>> 1  01-02-09  0.1
>> 1  01-02-09  0.3
>> 2  01-02-09  0.3
>> 2  01-02-09  0.4
>> 
>> I would like to produce this output:
>> id day          diff
>> 1  01-01-09  0.5
>> 2  01-01-09  0.2
>> 1  01-02-09  0.1
>> 2  01-02-09  0.3
>> 
>> It doesn't seem extremely difficult but I'm sure there are easier ways
>> than
>> how I am currently approaching it!
> 
> See ?aggregate
> 
>  > DF
>    id      day diff
> 1  1 01-01-09  0.5
> 2  1 01-01-09  0.7
> 3  2 01-01-09  0.2
> 4  2 01-01-09  0.4
> 5  1 01-02-09  0.1
> 6  1 01-02-09  0.3
> 7  2 01-02-09  0.3
> 8  2 01-02-09  0.4
> 
> 
>  > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
>    id      day   x
> 1  1 01-01-09 0.5
> 2  2 01-01-09 0.2
> 3  1 01-02-09 0.1
> 4  2 01-02-09 0.3
> 
> 
> Note that I have not converted the 'day' column to a 'date' class. You 
> would need to do that to perform any other date related operations 
> (including chronological sorting) on that column. See ?as.Date for more 
> information. For example:
> 
>    DF$day <- as.Date(DF$day, format = "%m-%d-%y")
> 
> 
> HTH,
> 
> Marc Schwartz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17836046.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list