[R-sig-Geo] complete linkage Agglomerative hierarchical clustering, nnclust, spatclus or something else?
Thomas Lumley
tlumley at u.washington.edu
Wed Apr 21 17:53:21 CEST 2010
You asked earlier about nnclust: it does single-linkage rather than complete-linkage clustering, that is, it defines clusters so that each point in the cluster has a nearest neighbour in the cluster closer than the threshold distance. This produces much less circular clusters than complete-linkage clustering.
The main distinctive feature of nnclust is that it is feasible even for quite large data sets, taking linear space and roughly nlogn time.
On Wed, 21 Apr 2010, Hans Ekbrand wrote:
> On Wed, Apr 21, 2010 at 03:14:46PM +0200, Roger Bivand wrote:
>> On Wed, 21 Apr 2010, Hans Ekbrand wrote:
> [...]
>>> Well, hclust was useful, once I understood how cutree works. What
>>> would be the benefit of dnearneigh(), is it faster?
>> For larger data sets, hclust needs a triangular distance matrix,
>> dnearneigh does not. Finding graph components in the output "nb" object
>> also seems conceptually more direct.
> OK, good to know if I run into trouble when using the code on larger
> data-sets later on.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-sig-Geo
mailing list