[R] uniq -c

Sam Steingold sds at gnu.org
Tue Oct 16 20:48:57 CEST 2012


> * Duncan Murdoch <zheqbpu.qhapna at tznvy.pbz> [2012-10-16 14:22:51 -0400]:
>
> On 16/10/2012 1:46 PM, Sam Steingold wrote:
>> > * Duncan Murdoch <zheqbpu.qhapna at tznvy.pbz> [2012-10-16 12:47:36 -0400]:
>> >
>> > On 16/10/2012 12:29 PM, Sam Steingold wrote:
>> >> x is sorted.
>> > sparseby(data=x, INDICES=x, FUN=nrow)
>>
>> this takes forever; apparently, it does not use the fact that x is
>> sorted (even then - it should not take more than a few minutes)...
>
> It was more or less instantaneous on the examples you posted.  It
> would be a bit more honest to say "it was fast on the examples, but it
> was very slow when I ran it on my real data, which consists of
> 100000000000000 cases."

sure, I did not mean any insult to your code, sorry.
all I was saying was that it was too slow for my purposes because it
ignores the fact that the data is sorted.
it turned out that paste+sort+rle+strsplit is fast enough.
(although there should be a way to avoid paste/strsplit!)
Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://truepeace.org
http://jihadwatch.org http://www.PetitionOnline.com/tap12009/
Every day above ground is a good day.




More information about the R-help mailing list