[R] How to extract last value in each group
Steve Lianoglou
lianoglou.steve at gene.com
Wed Aug 14 23:18:03 CEST 2013
While we're playing code golf, likely faster still could be to use
data.table. Assume your data is in a data.frame named "x":
R> library(data.table)
R> x <- data.table(x, key=c('Date', 'Time'))
R> ans <- x[, .SD[.N], by='Date']
-steve
On Wed, Aug 14, 2013 at 2:01 PM, William Dunlap <wdunlap at tibco.com> wrote:
> A somewhat faster version (for datasets with lots of dates, assuming it is sorted by date and time) is
> isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
> f3 <- function(dataFrame) {
> dataFrame[ isLastInRun(dataFrame$Date), ]
> }
> where your two suggestions, as functions, are
> f1 <- function (dataFrame) {
> dataFrame[unlist(with(dataFrame, tapply(Time, list(Date), FUN = function(x) x == max(x)))), ]
> }
> f2 <- function (dataFrame) {
> dataFrame[cumsum(with(dataFrame, tapply(Time, list(Date), FUN = which.max))), ]
> }
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of arun
>> Sent: Wednesday, August 14, 2013 1:08 PM
>> To: Noah Silverman
>> Cc: R help
>> Subject: Re: [R] How to extract last value in each group
>>
>> Hi,
>> Try:
>> dat1<- read.table(text="
>> Date Time O H L C U D
>> 06/01/2010 1358 136.40 136.40 136.35 136.35 2 12
>> 06/01/2010 1359 136.40 136.50 136.35 136.50 9 6
>> 06/01/2010 1400 136.45 136.55 136.35 136.40 8 7
>> 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>> 06/02/2010 331 136.55 136.70 136.50 136.70 36 6
>> 06/02/2010 332 136.70 136.70 136.65 136.65 3 1
>> 06/02/2010 334 136.75 136.75 136.75 136.75 1 0
>> 06/02/2010 335 136.80 136.80 136.80 136.80 4 0
>> 06/02/2010 336 136.80 136.80 136.80 136.80 8 0
>> 06/02/2010 337 136.75 136.80 136.75 136.80 1 2
>> 06/02/2010 338 136.80 136.80 136.80 136.80 3 0
>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>
>> dat1[unlist(with(dat1,tapply(Time,list(Date),FUN=function(x) x==max(x)))),]
>> # Date Time O H L C U D
>> #4 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>> #11 06/02/2010 338 136.80 136.80 136.80 136.80 3 0
>> #or
>> dat1[cumsum(with(dat1,tapply(Time,list(Date),FUN=which.max))),]
>> Date Time O H L C U D
>> 4 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>> 11 06/02/2010 338 136.80 136.80 136.80 136.80 3 0
>>
>> #or
>> dat1[as.logical(with(dat1,ave(Time,Date,FUN=function(x) x==max(x)))),]
>> # Date Time O H L C U D
>> #4 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>> #11 06/02/2010 338 136.80 136.80 136.80 136.80 3 0
>> A.K.
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Noah Silverman <noahsilverman at ucla.edu>
>> To: "R-help at r-project.org" <r-help at r-project.org>
>> Cc:
>> Sent: Wednesday, August 14, 2013 3:56 PM
>> Subject: [R] How to extract last value in each group
>>
>> Hello,
>>
>> I have some stock pricing data for one minute intervals.
>>
>> The delivery format is a bit odd. The date column is easily parsed and used as an index
>> for an its object. However, the time column is just an integer (1:1807)
>>
>> I just need to extract the *last* entry for each day. Don't actually care what time it was,
>> as long as it was the last one.
>>
>> Sure, writing a big nasty loop would work, but I was hoping that someone would be able
>> to suggest a faster way.
>>
>> Small snippet of data below my sig.
>>
>> Thanks!
>>
>>
>> --
>> Noah Silverman, M.S., C.Phil
>> UCLA Department of Statistics
>> 8117 Math Sciences Building
>> Los Angeles, CA 90095
>>
>> --------------------------------------------------------------------------
>>
>> Date Time O H L C U D
>> 06/01/2010 1358 136.40 136.40 136.35 136.35 2 12
>> 06/01/2010 1359 136.40 136.50 136.35 136.50 9 6
>> 06/01/2010 1400 136.45 136.55 136.35 136.40 8 7
>> 06/01/2010 1700 136.55 136.55 136.55 136.55 1 0
>> 06/02/2010 331 136.55 136.70 136.50 136.70 36 6
>> 06/02/2010 332 136.70 136.70 136.65 136.65 3 1
>> 06/02/2010 334 136.75 136.75 136.75 136.75 1 0
>> 06/02/2010 335 136.80 136.80 136.80 136.80 4 0
>> 06/02/2010 336 136.80 136.80 136.80 136.80 8 0
>> 06/02/2010 337 136.75 136.80 136.75 136.80 1 2
>> 06/02/2010 338 136.80 136.80 136.80 136.80 3 0
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
More information about the R-help
mailing list