[R] union of two sets are smaller than one set?
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sun Jan 31 22:11:44 CET 2021
On 31/01/2021 3:57 p.m., Martin Møller Skarbiniks Pedersen wrote:
> This is really puzzling me and when I try to make a small example
> everything works like expected.
>
> The problem:
>
> I got these two large vectors of strings.
>
>> str(s1)
> chr [1:766608] "0.dk" ...
>> str(s2)
> chr [1:59387] "043.dk" "0606.dk" "0618.dk" "0888.dk" "0iq.dk" "0it.dk" ...
>
> And I need to create the union-set of s1 and s2.
> I expect the size of the union-set to be between 766608 and 766608+59387.
> However it is 681193 which is less that number of elements in s1!
>
>> length(base::union(s1, s2))
> [1] 681193
>
> Any hints?
I imagine unique(s1) is shorter than s1. The union function is the same as
unique(c(s1, s2))
for your data. (The only difference is if s1 or s2 is named: the names
are dropped.)
Duncan Murdoch
More information about the R-help
mailing list