Henrik Bengtsson hb at stat.berkeley.edu
Wed Oct 28 21:03:41 CET 2009

unlist(..., use.names=FALSE) is heaps faster than the default
unlist(..., use.names=TRUE), cf.

> z <- split(sample(1000,1e6,rep=TRUE),rep(1:1e5,10))
> system.time(y1 <- Reduce(union,z))
user  system elapsed
5.98    0.00    5.89
> system.time(y2 <- unique(unlist(z)))
user  system elapsed
2.62    0.02    2.51
> system.time(y2b <- unique(unlist(z, use.names=FALSE)))
user  system elapsed
0.03    0.00    0.05
> system.time(y3 <- unique(do.call(c,z)))
user  system elapsed
2.28    0.03    2.37
> identical(y1,y2)
[1] TRUE
> identical(y1,y2b)
[1] TRUE
> identical(y2,y3)
[1] TRUE

On Wed, Oct 28, 2009 at 12:51 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> ... and just for amusement: unique(do.call(c,l))
>
> The do.call and unlist approaches should be faster than Reduce; do.call
> _may_ be marginally faster than unlist. Here's a timing comparison:
>
>
>> z <- split(sample(1000,1e6,rep=TRUE),rep(1:1e5,10))
>> length(z)
> [1] 100000
>
> ## the comparisons:
>
>> system.time(y1 <- Reduce(union,z))
>   user  system elapsed
>   5.02    0.00    5.03
>
>> system.time(y2 <- unique(unlist(z)))
>   user  system elapsed
>   1.92    0.00    1.92
>
>> system.time(y3 <- unique(do.call(c,z)))
>   user  system elapsed
>   1.75    0.00    1.75
>
>> identical(y1,y2)
> [1] TRUE
>> identical(y2,y3)
> [1] TRUE
>
> Obviously, this is unlikely to matter for any reasonable size dataset, but
> maybe it's instructive.
>
> Of course, Reduce wins the RGolf contest  ;-)
>
