[R] How to extract last value in each group

peter dalgaard pdalgd at gmail.com
Thu Aug 15 08:23:37 CEST 2013


On Aug 15, 2013, at 00:03 , David Winsemius wrote:

> 
> On Aug 14, 2013, at 2:18 PM, Steve Lianoglou wrote:
> 
>> While we're playing code golf, likely faster still could be to use
>> data.table. Assume your data is in a data.frame named "x":
>> 
>> R> library(data.table)
>> R> x <- data.table(x, key=c('Date', 'Time'))
>> R> ans <- x[, .SD[.N], by='Date']
> 
> I though code-golf was the most compact code:
> 
> dat1[ tapply(rownames(dat1), dat1$Date, tail, 1) , ]
> 
>         Date Time      O      H      L      C U D
> 4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
> 11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0

or even:

> aggregate(dat1[-1], dat1[1], tail, 1)
        Date Time      O      H      L      C U D
1 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
2 06/02/2010  338 136.80 136.80 136.80 136.80 3 0

(This relies on Date being the first col. For generality, I suppose you need

> aggregate(dat1, dat1["Date"], tail, 1)[-1]
        Date Time      O      H      L      C U D
1 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
2 06/02/2010  338 136.80 136.80 136.80 136.80 3 0

)

> 
>> 
>> -steve
>> 
>> On Wed, Aug 14, 2013 at 2:01 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>> A somewhat faster version (for datasets with lots of dates, assuming it is sorted by date and time) is
>>> isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>>> f3 <- function(dataFrame) {
>>>     dataFrame[ isLastInRun(dataFrame$Date), ]
>>> }
>>> where your two suggestions, as functions, are
>>> f1 <- function (dataFrame) {
>>>     dataFrame[unlist(with(dataFrame, tapply(Time, list(Date), FUN = function(x) x == max(x)))), ]
>>> }
>>> f2 <- function (dataFrame) {
>>>     dataFrame[cumsum(with(dataFrame, tapply(Time, list(Date), FUN = which.max))), ]
>>> }
>>> 
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>>>> Of arun
>>>> Sent: Wednesday, August 14, 2013 1:08 PM
>>>> To: Noah Silverman
>>>> Cc: R help
>>>> Subject: Re: [R] How to extract last value in each group
>>>> 
>>>> Hi,
>>>> Try:
>>>> dat1<- read.table(text="
>>>>       Date Time      O      H      L      C  U  D
>>>> 06/01/2010 1358 136.40 136.40 136.35 136.35  2  12
>>>> 06/01/2010 1359 136.40 136.50 136.35 136.50  9  6
>>>> 06/01/2010 1400 136.45 136.55 136.35 136.40  8  7
>>>> 06/01/2010 1700 136.55 136.55 136.55 136.55  1  0
>>>> 06/02/2010  331 136.55 136.70 136.50 136.70  36  6
>>>> 06/02/2010  332 136.70 136.70 136.65 136.65  3  1
>>>> 06/02/2010  334 136.75 136.75 136.75 136.75  1  0
>>>> 06/02/2010  335 136.80 136.80 136.80 136.80  4  0
>>>> 06/02/2010  336 136.80 136.80 136.80 136.80  8  0
>>>> 06/02/2010  337 136.75 136.80 136.75 136.80  1  2
>>>> 06/02/2010  338 136.80 136.80 136.80 136.80  3  0
>>>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>> 
>>>> dat1[unlist(with(dat1,tapply(Time,list(Date),FUN=function(x) x==max(x)))),]
>>>> #         Date Time      O      H      L      C U D
>>>> #4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>>> #11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>>> #or
>>>> dat1[cumsum(with(dat1,tapply(Time,list(Date),FUN=which.max))),]
>>>>        Date Time      O      H      L      C U D
>>>> 4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>>> 11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>>> 
>>>> #or
>>>> dat1[as.logical(with(dat1,ave(Time,Date,FUN=function(x) x==max(x)))),]
>>>> #        Date Time      O      H      L      C U D
>>>> #4  06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>>>> #11 06/02/2010  338 136.80 136.80 136.80 136.80 3 0
>>>> A.K.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> From: Noah Silverman <noahsilverman at ucla.edu>
>>>> To: "R-help at r-project.org" <r-help at r-project.org>
>>>> Cc:
>>>> Sent: Wednesday, August 14, 2013 3:56 PM
>>>> Subject: [R] How to extract last value in each group
>>>> 
>>>> Hello,
>>>> 
>>>> I have some stock pricing data for one minute intervals.
>>>> 
>>>> The delivery format is a bit odd.  The date column is easily parsed and used as an index
>>>> for an its object.  However, the time column is just an integer (1:1807)
>>>> 
>>>> I just need to extract the *last* entry for each day.  Don't actually care what time it was,
>>>> as long as it was the last one.
>>>> 
>>>> Sure, writing a big nasty loop would work, but I was hoping that someone would be able
>>>> to suggest a faster way.
>>>> 
>>>> Small snippet of data below my sig.
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> --
>>>> Noah Silverman, M.S., C.Phil
>>>> UCLA Department of Statistics
>>>> 8117 Math Sciences Building
>>>> Los Angeles, CA 90095
>>>> 
>>>> --------------------------------------------------------------------------
>>>> 
>>>>       Date Time      O      H      L      C  U  D
>>>> 06/01/2010 1358 136.40 136.40 136.35 136.35   2  12
>>>> 06/01/2010 1359 136.40 136.50 136.35 136.50   9   6
>>>> 06/01/2010 1400 136.45 136.55 136.35 136.40   8   7
>>>> 06/01/2010 1700 136.55 136.55 136.55 136.55   1   0
>>>> 06/02/2010  331 136.55 136.70 136.50 136.70  36   6
>>>> 06/02/2010  332 136.70 136.70 136.65 136.65   3   1
>>>> 06/02/2010  334 136.75 136.75 136.75 136.75   1   0
>>>> 06/02/2010  335 136.80 136.80 136.80 136.80   4   0
>>>> 06/02/2010  336 136.80 136.80 136.80 136.80   8   0
>>>> 06/02/2010  337 136.75 136.80 136.75 136.80   1   2
>>>> 06/02/2010  338 136.80 136.80 136.80 136.80   3   0
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> -- 
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list