[R] How to extract last value in each group
arun
smartpink111 at yahoo.com
Thu Aug 15 22:55:31 CEST 2013
I usually get better results with data.table except for this situation.
If I take another example unrelated to the current topic:
set.seed(1254)
name<- sample(letters,1e6,replace=TRUE)
number<- sample(1:10,1e6,replace=TRUE)
datTest<- data.frame(name,number,stringsAsFactors=FALSE)
system.time(res1<-aggregate(number~name,data=datTest,sum))
# user system elapsed
# 1.332 0.004 1.384
dtTest<- data.table(datTest)
system.time(res3<- dtTest[,list(Sum_Number=sum(number)),by=name])
# user system elapsed
# 0.052 0.000 0.051
res3New<- res3[order(name),]
names(res1)<-names(res3New)
identical(res1,as.data.frame(res3New))
#[1] TRUE
A.K.
----- Original Message -----
From: Steve Lianoglou <lianoglou.steve at gene.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Thursday, August 15, 2013 4:48 PM
Subject: Re: [R] How to extract last value in each group
Hi,
On Thu, Aug 15, 2013 at 1:38 PM, arun <smartpink111 at yahoo.com> wrote:
> I tried it again on a fresh start using the data.table alone:
> Now.
>
> dt1 <- data.table(dat2, key=c('Date', 'Time'))
> system.time(ans <- dt1[, .SD[.N], by='Date'])
> # user system elapsed
> # 40.908 0.000 40.981
> #Then tried:
> system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),])
> # user system elapsed
> # 0.148 0.000 0.151 #same time as before
Amazing. This is what I get on my MacBook Pro, i7 @ 3GHz (very close
specs to your machine):
R> dt1 <- data.table(dat2, key=c('Date', 'Time'))
R> system.time(ans <- dt1[, .SD[.N], by='Date'])
user system elapsed
0.064 0.009 0.073
R> system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),])
user system elapsed
0.148 0.016 0.165
On one of our compute server running who knows what processor on some
version of linux, but shouldn't really matter as we're talking
relative time to each other here:
R> system.time(ans <- dt1[, .SD[.N], by='Date'])
user system elapsed
0.160 0.012 0.170
R> system.time(res7<- dat2[cumsum(rle(dat2[,1])$lengths),])
user system elapsed
0.292 0.004 0.294
There's got to be some other explanation for the heavily degraded
performance you're observing... our R & data.table versions also
match.
-steve
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
More information about the R-help
mailing list