[R] Calculate Closest 5 Cases?
dsheuman@rogers.com
dsheuman at rogers.com
Fri Feb 13 17:34:14 CET 2004
I've only begun investigating R as a substitute for SPSS.
I have a need to identify for each CASE the closest (or most similar) 5
other CASES (not including itself as it is automatically the closest). I
have a fairly large matrix (50000 cases by 50 vars). In SPSS, I can use Correlate > Distances to generate a matrix of similarity, but only on a small sample. The entire matrix can not be processed at once due to memory limitations.
The data are all percents, so they are easy comparable.
Is there any way to do this in R?
Below is a small sample of the data (from SPSS) and the desired output.
Thanks,
Danny
*Sample Data.
DATA LIST LIST /id(F8) var1(F8.2) var2(F8.2) var3(F8.2) var4(F8.2) var5
(F8.2) var6(F8.2) var7(F8.2) var8(F8.2) var9(F8.2) var10(F8.2) var11(F8.2).
BEGIN DATA.
10170069 3.51 4.02 6.53 11.05 6.53 8.04 13.57 20.10 11.05 8.55
7.04
10190229 1.89 5.66 4.61 7.62 8.45 13.21 9.50 20.82 16.07 9.36
3.77
10540023 3.40 5.08 3.39 4.52 10.18 14.71 13.56 16.38 9.60 7.89
11.85
10650413 6.64 6.64 3.73 4.70 3.78 13.23 19.82 15.98 12.26 8.48
3.78
10662074 5.11 5.81 4.37 5.11 6.55 14.60 18.97 11.68 10.25 8.75
8.79
10770041 6.43 4.17 6.34 4.26 6.34 4.26 19.11 19.20 14.95 12.77
4.35
11010422 3.14 4.71 6.81 7.85 5.75 6.81 15.18 15.18 13.61 11.00
9.44
11060762 7.03 5.03 6.95 5.99 5.92 12.94 15.01 12.06 11.98 8.06
9.02
11070078 4.61 9.22 4.61 7.94 6.27 12.75 14.02 20.49 7.75 7.75
4.61
11180646 4.48 5.35 6.29 5.42 4.55 11.71 20.74 15.32 14.45 8.09
3.61
11460001 5.71 7.34 6.48 5.68 4.07 10.55 13.83 18.69 12.15 9.76
4.87
11650133 6.00 3.72 6.72 6.00 7.50 17.94 13.44 16.37 13.51 5.15
3.65
11650275 4.02 8.06 6.06 8.10 5.06 8.10 17.16 14.12 12.14 14.12
4.02
11780034 4.25 4.28 5.30 5.33 6.38 14.88 15.96 18.08 14.85 7.48
3.20
11790016 4.40 4.40 5.54 4.40 4.40 10.93 17.67 19.72 13.20 12.13
4.33
12660338 6.60 7.54 5.66 8.49 10.38 11.31 16.06 12.26 8.49 8.49
4.73
12660644 5.51 3.14 3.95 7.09 7.11 14.98 15.72 18.90 9.44 5.50
8.65
12661667 5.44 4.50 5.44 4.50 5.44 12.69 13.63 11.81 9.07 13.68
13.79
END DATA.
*Output should be:.
*.
* ID1 CLOSEID1 CLOSEID2 CLOSEID3 CLOSEID4 CLOSEID5.
* ID2 CLOSEID1 CLOSEID2 CLOSEID3 CLOSEID4 CLOSEID5.
* ID3 CLOSEID1 CLOSEID2 CLOSEID3 CLOSEID4 CLOSEID5.
* ID4 CLOSEID1 CLOSEID2 CLOSEID3 CLOSEID4 CLOSEID5.
* ID5 CLOSEID1 CLOSEID2 CLOSEID3 CLOSEID4 CLOSEID5.
More information about the R-help
mailing list