[R] parallel clustering, amap, hcluster

Ziqi Zhang ziqi.zhang at sheffield.ac.uk
Sat Aug 8 13:15:50 CEST 2015


Hi
I am looking for parallel implementation of hierarchical clustering, the 
equivalent to "hclust" in the "fpc" package.

I found "hcluster" from "amap" package:

hcluster(x, method = "euclidean", diag = FALSE, upper = FALSE,
          link = "complete", members = NULL, nbproc = 2,
          doubleprecision = TRUE)

It takes a data matrix, computes distance matrix then do clustering.
However in my application, /i have to compute the distance matrix and 
use it later anyway. So hcluster is re-computing the distance which is a 
waste of time, as my data is very large scale.

Is there anyway hcluster could just use a pre-computed distance object, 
or obtain the distance object from hcluster, so I can avoid 
double-computing the distane object?

Or more general question is, if there is a parallel implementation of 
hierarchical clustering that takes input a distance matrix, rather than 
the raw data matrix?
Many thanks!

---
This email has been checked for viruses by Avast antivirus software.



More information about the R-help mailing list