[R-sig-Geo] complete linkage Agglomerative hierarchical clustering, nnclust, spatclus or something else?

Thomas Lumley tlumley at u.washington.edu
Wed Apr 21 17:53:21 CEST 2010



You asked earlier about nnclust:  it does single-linkage rather than complete-linkage clustering, that is, it defines clusters so that each point in the cluster has a nearest neighbour in the cluster closer than the threshold distance. This produces much less circular clusters than complete-linkage clustering.

The main distinctive feature of nnclust is that it is feasible even for quite large data sets, taking linear space and roughly nlogn time.

         -thomas


On Wed, 21 Apr 2010, Hans Ekbrand wrote:

> On Wed, Apr 21, 2010 at 03:14:46PM +0200, Roger Bivand wrote:
>> On Wed, 21 Apr 2010, Hans Ekbrand wrote:
>
> [...]
>
>>> Well, hclust was useful, once I understood how cutree works. What
>>> would be the benefit of dnearneigh(), is it faster?
>>>
>>
>> For larger data sets, hclust needs a triangular distance matrix,
>> dnearneigh does not. Finding graph components in the output "nb" object
>> also seems conceptually more direct.
>
> OK, good to know if I run into trouble when using the code on larger
> data-sets later on.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-sig-Geo mailing list