[R] millions of comparisons, speed wanted

Adrian DUSA adi at roda.ro
Thu Dec 15 21:04:01 CET 2005


Dear Andy,

On Thursday 15 December 2005 20:57, Liaw, Andy wrote:
> Just some untested idea:
> If the data are all 0/1, you could use dist(input, method="manhattan"), and
> then check which entry equals 1.  This should be much faster than creating
> all pairs of rows and check position-by-position.

Thanks for the idea, I played a little with it. At the beginning yes, the data 
are all 0/1, but during the minimizing iterations there are also "x" values; 
for example comparing:
0 1 0 1 1
0 0 0 1 1
should return
0 "x" 0 1 1

whereas
0 "x" 0 1 1
0 0 0 1 1
shouldn't even be compared (they have different number of figures).

Replacing "x" with NA in dist is not yielding results either, as with
NA 0 0 1 1
0 0 0 1 1
dist returns 0.

I even wanted to see if I could tweak the dist code, but it calls a C program 
and I gave up.

Nice idea anyhow, maybe I'll find a way to use it further.
Best,
Adrian

-- 
Adrian DUSA
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101




More information about the R-help mailing list