[R] Determining which.max() within groups

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Jun 7 17:57:10 CEST 2017


Aggregate can do both which.max and group length calculations, but the 
result ends up as a matrix inside the data frame, which I find cumbersome 
to work with.

Daily <- read.table( text =
"     Date  wyr        Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784
", header = TRUE, stringsAsFactors=FALSE)

# this algorithm only works if wyr groups are contiguous
out <- out[ order(out$wyr), ]
# generate a data frame with key column wyr and matrix Q as the second 
column
out <- aggregate( Q ~ wyr
                 , data = Daily
                 , FUN = function(x) {
                      c( WM = which.max(x)
                       , n=length( x )
                       )
                   }
                 )
# put matrix into separate columns Q.WM
out[ , paste( "Q", colnames( out$Q ), sep="." ) ] <- out$Q
# drop the matrix
out$Q <- NULL
# form absolute indexes Q.N
out <- within( out, {
         Q.maxidx <- cumsum( c( 0, Q.n[ -length(Q.n) ] ) ) + Q.WM
        })
result <- Daily[ with( out, Q.maxidx ), ]

# or save ourselves some effort
library(dplyr)
result2 <- (   Daily
            %>% group_by( wyr )
            %>% slice( which.max( Q ) )
            %>% as.data.frame
            )

On Tue, 6 Jun 2017, Bert Gunter wrote:

> cumsum() seems to be what you need.
>
> This can probably be done more elegantly, but ...
>
> out <- aggregate(Q ~ wyr, data = Daily, which.max)
> tbl <- table(Daily$wyr)
> out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
> out
>
> ## yields
>
>   wyr  Q
> 1 1990  4
> 2 1991  6
> 3 1992  9
> 4 1993 15
> 5 1994 18
>
> I leave the matter of Julian dates to you or others.
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote:
>> Using the dataset below, I got close to what I'm after, but not quite all
>> the way there.  Any suggestions appreciated:
>>
>> Daily <- read.table(textConnection("     Date  wyr        Q
>> 1911-04-01 1990 4.530695
>> 1911-04-02 1990 4.700596
>> 1911-04-03 1990 4.898814
>> 1911-04-04 1990 5.097032
>> 1911-04-05 1991 5.295250
>> 1911-04-06 1991 6.569508
>> 1911-04-07 1991 5.861587
>> 1911-04-08 1991 5.153666
>> 1911-04-09 1992 4.445745
>> 1911-04-10 1992 3.737824
>> 1911-04-11 1992 3.001586
>> 1911-04-12 1992 3.001586
>> 1911-04-13 1993 2.350298
>> 1911-04-14 1993 2.661784
>> 1911-04-16 1993 3.001586
>> 1911-04-17 1993 2.661784
>> 1911-04-19 1994 2.661784
>> 1911-04-28 1994 3.369705
>> 1911-04-29 1994 3.001586
>> 1911-05-20 1994 2.661784"),header=TRUE)
>>
>> aggregate(Q ~ wyr, data = Daily, which.max)
>>
>> # gives:
>> #    wyr Q
>> # 1 1990 4
>> # 2 1991 2
>> # 3 1992 1
>> # 4 1993 3
>> # 5 1994 2
>>
>> I can 'see' that it is returning the which.max() relative to each
>> grouping.  Is there a way to instead return the absolute position (row) of
>> the max value within each group.  i.e.:
>>
>> # Would instead like to have
>> #     wyr  Q
>> # 1  1990  4
>> # 2  1991  6
>> # 3  1992  9
>> # 4  1993  15
>> # 5  1994  18
>>
>> The icing on the cake would be to get the Julien Day corresponding to the
>> date on which each year's maximum occurs?
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list