[R] Cluster analysis: hclust manipulation possible?

Charles C. Berry cberry at tajo.ucsd.edu
Mon Nov 16 18:13:35 CET 2009


On Mon, 16 Nov 2009, Jopi Harri wrote:

> I am doing cluster analysis [hclust(Dist, method="average")] on
> data that potentially contains redundant objects. As expected,
> the inclusion of redundant objects affects the clustering result,
> i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
> cluster differently from the same data without the redundancy,
> i.e., a1, b, c, d, e1. This is apparent when the outcome is
> visualized as a dendrogram.
>
> Now, it seems that the clustering result for which the redundancy
> has been eliminated is more robust for the present assignment
> than that of the redundant data. Naturally, there is no problem
> in the elimination: just exclude the redundant objects from Dist.
>
> However, it would be very convenient to be able to include the
> redundant objects in the *dendrogram* by attaching them as
> 0-level branches to the subtrees, i.e.:
>
> 1.0........-------........
> 0.5....___|__...._|_......
> 0.0.._|_..|..|..|.._|_....
> ....|.|.|.|..|..|.|...|...
> ...a1a2a3.b..c..d.e1.e2...
>
> instead of
>
> 1.0........-------........
> 0.5....___|__...._|_......
> 0.0...|...|..|..|...|.....
> ......a1..b..c..d..e1.....
>
> The question: Can this be accomplished in the *dendrogram plot*
> by manipulating the resulting hclust data structure or by some
> other means, and if yes, how?


Yes, you need to study

 	?hclust

particularly the part about 'Value' from which you will see what needs 
modification.


Here is a very simple example:

> res <- hclust(dist(1-diag(3)*rnorm(3)))
> plot(res)
> res2 <- res
> res2$merge <- rbind(-cbind(1:3,4:6), matrix(ifelse( res2$merge<0, -res2$merge, res2$merge+sum(res2$merge<0)),2))
> res2$height <- c(rep(0,3), res2$height)
> res2$order <- as.vector( rbind(res2$order,(4:6)[res2$order]) )
> plot(res2)
> str( res )
> str( res2 )


Alternatively, you could use as.dendrogram( res ) as the point of 
departure and manipulate the value.

HTH,

Chuck



>
> Jopi Harri
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list