[R] How to get the rows corresponding to the maximum of a factor

David Winsemius dwinsemius at comcast.net
Tue May 31 21:52:21 CEST 2011


On May 31, 2011, at 2:51 PM, James Rome wrote:

> I have a data frame as follows:
> MsgType    eotpd                       fn
> FI     2011-05-13 01:40:00          0
> FF     2011-05-13 01:39:53          0
> TC     2011-05-13 01:39:45          0
> FI       2011-05-14 00:58:46          1
> FF     2011-05-14 00:58:46          1
> FI     2011-05-15 00:48:32          2
> FF     2011-05-15 00:48:21          2
> TC     2011-05-15 00:48:15          2
> FI     2011-05-16 02:00:01          3
> FF     2011-05-16 01:59:46          3
> FI     2011-05-17 02:22:05          4
> FF     2011-05-17 02:21:58          4
> FI     2011-05-18 01:50:35          5
> FF     2011-05-18 01:50:30          5
> FI     2011-05-19 02:05:24          6
> FF     2011-05-19 02:05:20          6
> TC     2011-05-19 02:05:19          6
> FI     2011-05-13 17:04:15          8
> TC     2011-05-13 17:04:04          8
> FI     2011-05-16 17:32:40          9
> FF     2011-05-16 17:32:19          9
> TC     2011-05-16 17:32:06          9
> FI     2011-05-17 18:39:42         10
> FF     2011-05-17 18:39:38         10
> FI     2011-05-18 17:54:55         11
> FF     2011-05-18 17:54:57         11
> TC     2011-05-18 17:54:50         11
> FI     2011-05-19 17:26:01         12
> FF     2011-05-19 17:26:01         12
> TC     2011-05-19 17:25:53         12
> . . .
> As you can see, I do not always have all three MsgTypes for a given fn
> The MsgTypes are an ordered factor: FL < FF < TC.
> What I want to get is a data frame having the maximum MsgType and its
> eotpd for each fn:

Assuming this is in a dataframe, 'rrr' (so named for my annoyance that  
you did not use dput to offer the example) with this structure:
 > str(rrr)
'data.frame':	30 obs. of  3 variables:
  $ V1: Ord.factor w/ 3 levels "FI"<"FF"<"TC": 1 2 3 1 2 1 2 3 1 2 ...
  $ V2: POSIXct, format: "2011-05-13 01:40:00" "2011-05-13 01:39:53" ...
  $ V3: num  0 0 0 1 1 2 2 2 3 3 ...

Then this seems to fit the description:

  idx <- sapply( split(seq_len(nrow(rrr)), rrr$V3),
                    function(x) {
                         x[which.max(rrr$V1[x])]})
 > rrr[idx, ]
    V1                  V2 V3
3  TC 2011-05-13 01:39:45  0
5  FF 2011-05-14 00:58:46  1
8  TC 2011-05-15 00:48:15  2
10 FF 2011-05-16 01:59:46  3
12 FF 2011-05-17 02:21:58  4
14 FF 2011-05-18 01:50:30  5
17 TC 2011-05-19 02:05:19  6
19 TC 2011-05-13 17:04:04  8
22 TC 2011-05-16 17:32:06  9
24 FF 2011-05-17 18:39:38 10
27 TC 2011-05-18 17:54:50 11
30 TC 2011-05-19 17:25:53 12


-- 
David.


> MsgType    eotpd                       fn
> TC     2011-05-13 01:39:45          0
> FF     2011-05-14 00:58:46          1
> TC     2011-05-15 00:48:15          2
> FF     2011-05-16 01:59:46          3
> FF     2011-05-17 02:21:58          4
> FF     2011-05-18 01:50:30          5
> TC     2011-05-19 02:05:19          6
> TC     2011-05-13 17:04:04          8
> TC     2011-05-16 17:32:06          9
> FF     2011-05-17 18:39:38         10
> TC     2011-05-18 17:54:50         11
> TC     2011-05-19 17:25:53         12
> . . .
>
> Surely there is a clever way to do this in R?
>
> Thanks for the help,
> Jim
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list