[R] Calculate Closest 5 Cases?

dsheuman@rogers.com dsheuman at rogers.com
Fri Feb 13 17:34:14 CET 2004


I've only begun investigating R as a substitute for SPSS.

I have a need to identify for each CASE the closest (or most similar) 5 
other CASES (not including itself as it is automatically the closest).  I 
have a fairly large matrix (50000 cases by 50 vars).  In SPSS, I can use Correlate > Distances to generate a matrix of similarity, but only on a small sample.  The entire matrix can not be processed at once due to memory limitations.

The data are all percents, so they are easy comparable.  

Is there any way to do this in R?

Below is a small sample of the data (from SPSS) and the desired output.

Thanks,

Danny




*Sample Data.
DATA LIST LIST /id(F8) var1(F8.2) var2(F8.2) var3(F8.2) var4(F8.2) var5
(F8.2) var6(F8.2) var7(F8.2) var8(F8.2) var9(F8.2) var10(F8.2) var11(F8.2).
BEGIN DATA.
10170069	3.51	4.02	6.53	11.05	6.53	8.04	13.57	20.10	11.05	8.55
	7.04
10190229	1.89	5.66	4.61	7.62	8.45	13.21	9.50	20.82	16.07	9.36
	3.77
10540023	3.40	5.08	3.39	4.52	10.18	14.71	13.56	16.38	9.60	7.89
	11.85
10650413	6.64	6.64	3.73	4.70	3.78	13.23	19.82	15.98	12.26	8.48
	3.78
10662074	5.11	5.81	4.37	5.11	6.55	14.60	18.97	11.68	10.25	8.75
	8.79
10770041	6.43	4.17	6.34	4.26	6.34	4.26	19.11	19.20	14.95	12.77
	4.35
11010422	3.14	4.71	6.81	7.85	5.75	6.81	15.18	15.18	13.61	11.00
	9.44
11060762	7.03	5.03	6.95	5.99	5.92	12.94	15.01	12.06	11.98	8.06
	9.02
11070078	4.61	9.22	4.61	7.94	6.27	12.75	14.02	20.49	7.75	7.75
	4.61
11180646	4.48	5.35	6.29	5.42	4.55	11.71	20.74	15.32	14.45	8.09
	3.61
11460001	5.71	7.34	6.48	5.68	4.07	10.55	13.83	18.69	12.15	9.76
	4.87
11650133	6.00	3.72	6.72	6.00	7.50	17.94	13.44	16.37	13.51	5.15
	3.65
11650275	4.02	8.06	6.06	8.10	5.06	8.10	17.16	14.12	12.14	14.12
	4.02
11780034	4.25	4.28	5.30	5.33	6.38	14.88	15.96	18.08	14.85	7.48
	3.20
11790016	4.40	4.40	5.54	4.40	4.40	10.93	17.67	19.72	13.20	12.13
	4.33
12660338	6.60	7.54	5.66	8.49	10.38	11.31	16.06	12.26	8.49	8.49
	4.73
12660644	5.51	3.14	3.95	7.09	7.11	14.98	15.72	18.90	9.44	5.50
	8.65
12661667	5.44	4.50	5.44	4.50	5.44	12.69	13.63	11.81	9.07	13.68
	13.79
END DATA.

*Output should be:.
*.
*	ID1	CLOSEID1	CLOSEID2	CLOSEID3	CLOSEID4	CLOSEID5.
*	ID2	CLOSEID1	CLOSEID2	CLOSEID3	CLOSEID4	CLOSEID5.
*	ID3	CLOSEID1	CLOSEID2	CLOSEID3	CLOSEID4	CLOSEID5.
*	ID4	CLOSEID1	CLOSEID2	CLOSEID3	CLOSEID4	CLOSEID5.
*	ID5	CLOSEID1	CLOSEID2	CLOSEID3	CLOSEID4	CLOSEID5.




More information about the R-help mailing list