[R] how to efficiently compute set unique?
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Jun 22 03:18:00 CEST 2010
On 21/06/2010 9:06 PM, G FANG wrote:
> Hi,
>
> I want to get the unique set from a large numeric k by 1 vector, k is
> in tens of millions
>
> when I used the matlab function unique, it takes less than 10 secs
>
> but when I tried to use the unique in R with similar CPU and memory,
> it is not done in minutes
>
> I am wondering, am I using the function in the right way?
>
> dim(cntxtn)
> [1] 13584763 1
> uniqueCntxt = unique(cntxtn); # this is taking really long
What type is cntxtn? If I do that sort of thing on a numeric vector,
it's quite fast:
> x <- sample(100000, size=13584763, replace=T)
> system.time(unique(x))
user system elapsed
3.61 0.14 3.75
More information about the R-help
mailing list