[R] aggregate() runs out of memory
Sam Steingold
sds at gnu.org
Mon Nov 26 23:59:27 CET 2012
Hi,
> * Steve Lianoglou <znvyvatyvfg.ubarlcbg at tznvy.pbz> [2012-11-26 17:32:21 -0500]:
>
>> --8<---------------cut here---------------start------------->8---
>>> f <- data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12)
>>> f
>> id country delay
>> 1 1 6 1
>> 2 2 7 2
>> 3 3 8 3
>> 4 1 6 4
>> 5 2 7 5
>> 6 3 8 6
>> 7 1 6 7
>> 8 2 7 8
>> 9 3 8 9
>> 10 1 6 10
>> 11 2 7 11
>> 12 3 8 12
>>> f <- as.data.table(f)
>>> setkey(f,id)
>>> delays <- f[,list(min=min(delay),max=max(delay),count=.N,country=unique(country)),by="id"]
>>> delays
>> id min max count country
>> 1: 1 1 10 4 6
>> 2: 2 2 11 4 7
>> 3: 3 3 12 4 8
>> --8<---------------cut here---------------end--------------->8---
>>
>> this is still too slow, apparently because of unique.
>> how do I speed it up?
>
> I think I'm missing something.
>
> Your call to `min(delay)` and `max(delay)` will return the minimum and
> maximum delays within the particular "id" you are grouping by. I guess
> there must be several values for "country" within each "id" group --
> do you really want the same min and max values to be replicated as
> many times as there are unique "country"s?
there is precisely one country for each id.
i.e., unique(country) is the same as country[1].
thanks a lot for the suggestion!
> R> result <- f[, list(min=min(delay), max=max(delay),
> count=.N,country=country[1L]), by="share.id"]
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com http://pmw.org.il
http://honestreporting.com http://americancensorship.org
Why do you never call me back after I scream that I will never talk to you again?!
More information about the R-help
mailing list