[R] union of two sets are smaller than one set?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Jan 31 22:11:44 CET 2021


On 31/01/2021 3:57 p.m., Martin Møller Skarbiniks Pedersen wrote:
> This is really puzzling me and when I try to make a small example
> everything works like expected.
> 
> The problem:
> 
> I got these two large vectors of strings.
> 
>> str(s1)
>   chr [1:766608] "0.dk" ...
>> str(s2)
>   chr [1:59387] "043.dk" "0606.dk" "0618.dk" "0888.dk" "0iq.dk" "0it.dk" ...
> 
> And I need to create the union-set of s1 and s2.
> I expect the size of the union-set to be between 766608 and 766608+59387.
> However it is 681193 which is less that number of elements in s1!
> 
>> length(base::union(s1, s2))
> [1] 681193
> 
> Any hints?

I imagine unique(s1) is shorter than s1.  The union function is the same as

unique(c(s1, s2))

for your data.  (The only difference is if s1 or s2 is named:  the names 
are dropped.)

Duncan Murdoch



More information about the R-help mailing list