[R] Determining which.max() within groups
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Wed Jun 7 17:57:10 CEST 2017
Aggregate can do both which.max and group length calculations, but the
result ends up as a matrix inside the data frame, which I find cumbersome
to work with.
Daily <- read.table( text =
" Date wyr Q
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784
", header = TRUE, stringsAsFactors=FALSE)
# this algorithm only works if wyr groups are contiguous
out <- out[ order(out$wyr), ]
# generate a data frame with key column wyr and matrix Q as the second
column
out <- aggregate( Q ~ wyr
, data = Daily
, FUN = function(x) {
c( WM = which.max(x)
, n=length( x )
)
}
)
# put matrix into separate columns Q.WM
out[ , paste( "Q", colnames( out$Q ), sep="." ) ] <- out$Q
# drop the matrix
out$Q <- NULL
# form absolute indexes Q.N
out <- within( out, {
Q.maxidx <- cumsum( c( 0, Q.n[ -length(Q.n) ] ) ) + Q.WM
})
result <- Daily[ with( out, Q.maxidx ), ]
# or save ourselves some effort
library(dplyr)
result2 <- ( Daily
%>% group_by( wyr )
%>% slice( which.max( Q ) )
%>% as.data.frame
)
On Tue, 6 Jun 2017, Bert Gunter wrote:
> cumsum() seems to be what you need.
>
> This can probably be done more elegantly, but ...
>
> out <- aggregate(Q ~ wyr, data = Daily, which.max)
> tbl <- table(Daily$wyr)
> out$Q <- out$Q + cumsum(c(0,tbl[-length(tbl)]))
> out
>
> ## yields
>
> wyr Q
> 1 1990 4
> 2 1991 6
> 3 1992 9
> 4 1993 15
> 5 1994 18
>
> I leave the matter of Julian dates to you or others.
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jun 6, 2017 at 6:30 PM, Morway, Eric <emorway at usgs.gov> wrote:
>> Using the dataset below, I got close to what I'm after, but not quite all
>> the way there. Any suggestions appreciated:
>>
>> Daily <- read.table(textConnection(" Date wyr Q
>> 1911-04-01 1990 4.530695
>> 1911-04-02 1990 4.700596
>> 1911-04-03 1990 4.898814
>> 1911-04-04 1990 5.097032
>> 1911-04-05 1991 5.295250
>> 1911-04-06 1991 6.569508
>> 1911-04-07 1991 5.861587
>> 1911-04-08 1991 5.153666
>> 1911-04-09 1992 4.445745
>> 1911-04-10 1992 3.737824
>> 1911-04-11 1992 3.001586
>> 1911-04-12 1992 3.001586
>> 1911-04-13 1993 2.350298
>> 1911-04-14 1993 2.661784
>> 1911-04-16 1993 3.001586
>> 1911-04-17 1993 2.661784
>> 1911-04-19 1994 2.661784
>> 1911-04-28 1994 3.369705
>> 1911-04-29 1994 3.001586
>> 1911-05-20 1994 2.661784"),header=TRUE)
>>
>> aggregate(Q ~ wyr, data = Daily, which.max)
>>
>> # gives:
>> # wyr Q
>> # 1 1990 4
>> # 2 1991 2
>> # 3 1992 1
>> # 4 1993 3
>> # 5 1994 2
>>
>> I can 'see' that it is returning the which.max() relative to each
>> grouping. Is there a way to instead return the absolute position (row) of
>> the max value within each group. i.e.:
>>
>> # Would instead like to have
>> # wyr Q
>> # 1 1990 4
>> # 2 1991 6
>> # 3 1992 9
>> # 4 1993 15
>> # 5 1994 18
>>
>> The icing on the cake would be to get the Julien Day corresponding to the
>> date on which each year's maximum occurs?
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list