[R] Sparse KMeans/KDE/Nearest Neighbors?

manyu_aditya abhimanyu.aditya at gmail.com
Wed Feb 24 23:00:31 CET 2010


I have a dataset (the netflix dataset) which is basically ~18k columns and
well variable number of rows but let's assume 25 thousand for now. The
dataset is very sparse. I was wondering how to do kmeans/nearest neighbors
or kernel density estimation on it. 

I tired using the spMatrix function in "Matrix" package. I think I'm able to
create the matrix but as soon as I pass it to kmeans functions in package
"stats" it says cannot allocate 3.3Gb. Which is basically 18k * 25K * 8.

There is a sparse kmeans solver by tibshirani but that epxects a regular
dense format matrix so again the issue is the same. 

A simple "no" this is not possible answer shall suffice as long as you are

tHanks much.
View this message in context: http://n4.nabble.com/Sparse-KMeans-KDE-Nearest-Neighbors-tp1568129p1568129.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list