[R] pam() clustering for large data sets

Lilia Nedialkova lbravewo at princeton.edu
Tue May 17 00:26:25 CEST 2011


Hello everyone,

I need to do k-medoids clustering for data which consists of 50,000
observations.  I have computed distances between the observations
separately and tried to use those with pam().

I got the "cannot allocate vector of length" error and I realize this
job is too memory intensive.  I am at a bit of a loss on what to do at
this point.

I can't use clara(), because I want to use the already computed distances.

What is it that people do to perform clustering for such large data sets?

I would greatly appreciate any form of suggestions that people may have.

Thank you very much in advance.



More information about the R-help mailing list