[R] aggregate() runs out of memory
Sam Steingold
sds at gnu.org
Mon Nov 26 22:57:52 CET 2012
hi Steve,
> * Steve Lianoglou <znvyvatyvfg.ubarlcbg at tznvy.pbz> [2012-11-26 16:08:59 -0500]:
> On Mon, Nov 26, 2012 at 3:13 PM, Sam Steingold <sds at gnu.org> wrote:
>>> * Steve Lianoglou <znvyvatyvfg.ubarlcbg at tznvy.pbz> [2012-11-19 13:30:03 -0800]:
>>>
>>> For instance, if you want the min and max of `delay` within each group
>>> defined by `share.id`, and let's assume `infl` is a data.frame, you
>>> can do something like so:
>>>
>>> R> as.data.table(infl)
>>> R> setkey(infl, share.id)
>>> R> result <- infl[, list(min=min(delay), max=max(delay)), by="share.id"]
>>
>> perfect, thanks.
>> alas, the resulting table does not contain the share.id column.
>> do I need to add something like "id=unique(share.id)" to the list?
>> also, if there is a field in the original table infl which only depends
>> on share.id, how do I add this unique value to the summary?
>> it appears that "count=unique(country)" in list() does what I need, but
>> it slows down the process.
>
> Hmm ... I think it should be there, but I'm having a hard time
> remember what you want.
>
> Could you please copy paste the output of `(head(infl, 20))` as
> well as an approximation of what the result is that you want.
this prints all the levels for all the factor columns and takes
megabytes.
--8<---------------cut here---------------start------------->8---
> f <- data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12)
> f
id country delay
1 1 6 1
2 2 7 2
3 3 8 3
4 1 6 4
5 2 7 5
6 3 8 6
7 1 6 7
8 2 7 8
9 3 8 9
10 1 6 10
11 2 7 11
12 3 8 12
> f <- as.data.table(f)
> setkey(f,id)
> delays <- f[,list(min=min(delay),max=max(delay),count=.N,country=unique(country)),by="id"]
> delays
id min max count country
1: 1 1 10 4 6
2: 2 2 11 4 7
3: 3 3 12 4 8
--8<---------------cut here---------------end--------------->8---
this is still too slow, apparently because of unique.
how do I speed it up?
Thanks.
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il
http://ffii.org http://pmw.org.il http://mideasttruth.com
Programming is like sex: one mistake and you have to support it for a lifetime.
More information about the R-help
mailing list