[R] Is it possible to obtain an agglomeration schedule with R cluster analyis

William Dunlap wdunlap at tibco.com
Sun Feb 24 00:27:31 CET 2013


You didn't show what the tabular summary should look like.
However, look at the height and merge components of
an hclust object:

> hc3 <- hclust(dist(USArrests[1:8, c(1,2,4)]))
> data.frame(hc3[2:1])
      height merge.1 merge.2
1   9.297849      -1      -8
2  13.609188      -2      -5
3  23.779193      -4      -6
4  33.865321      -3       2
5  48.229659       1       3
6 104.636227       4       5
7 185.135221      -7       6
The two merge.* columns identify what groups merged at
the corresponding height value.  Negative values, i, refer to the
-i'th leaf value in the 'labels' component and positive values, i, refer
to cluster created in the i'th row of the data.frame.  The following
function transforms those references into name:

f <- function(hc){
     data.frame(row.names=paste0("Cluster",seq_along(hc$height)),
                height=hc$height,
                components=ifelse(hc$merge<0, hc$labels[abs(hc$merge)], paste0("Cluster",hc$merge)),
                stringsAsFactors=FALSE)
}

as in
> f(hc3)
             height components.1 components.2
Cluster1   9.297849      Alabama     Delaware
Cluster2  13.609188       Alaska   California
Cluster3  23.779193     Arkansas     Colorado
Cluster4  33.865321      Arizona     Cluster2
Cluster5  48.229659     Cluster1     Cluster3
Cluster6 104.636227     Cluster4     Cluster5
Cluster7 185.135221  Connecticut     Cluster6

Compare that to the output of str(as.dendrogram(hc3)):

> str(as.dendrogram(hc3))
--[dendrogram w/ 2 branches and 8 members at h = 185]
  |--leaf "Connecticut" 
  `--[dendrogram w/ 2 branches and 7 members at h = 105]
     |--[dendrogram w/ 2 branches and 3 members at h = 33.9]
     |  |--leaf "Arizona" 
     |  `--[dendrogram w/ 2 branches and 2 members at h = 13.6]
     |     |--leaf "Alaska" 
     |     `--leaf "California" 
     `--[dendrogram w/ 2 branches and 4 members at h = 48.2]
        |--[dendrogram w/ 2 branches and 2 members at h = 9.3]
        |  |--leaf "Alabama" 
        |  `--leaf "Delaware" 
        `--[dendrogram w/ 2 branches and 2 members at h = 23.8]
           |--leaf "Arkansas" 
           `--leaf "Colorado"

Does f() produce the information you need for your display?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Bob Green
> Sent: Saturday, February 23, 2013 12:49 PM
> To: Uwe Ligges
> Cc: r-help at r-project.org
> Subject: Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis
> 
> Hello Uwes,
> 
> Thanks. Re-reading the hclust pages I found that using the hclust
> 'USArrests' data  that the command > plot (hc1)  will generate the
> order in which cases joined. however, I still can't see how to obtain
> the respective height at which each case joined each cluster or the
> height when clusters merge.
> 
> 
> The dendrogram {stats} page provides the following code which
> produces the information that I require. However, what I would like
> to obtain is a table of the height at which cluster formed.
> 
>  > hc <- hclust(dist(USArrests), "ave")
>  > (dend1 <- as.dendrogram(hc)) # "print()" method
>  > str(dend1)          # "str()" method
> 
> I also found as.hclust which plots what I want, but I still can't
> find a way to produce the actual height values which are being
> plotted, for example as a tabular summary.
> 
>   plot(hc) ;  mtext("hclust", side=1)
> 
> Any assistance is appreciated,
> 
> Bob
> 
> 
> 
> At 04:01 AM 24/02/2013, Uwe Ligges wrote:
> 
> 
> >On 22.02.2013 11:41, Bob Green wrote:
> >>Hello,
> >>
> >>In SPSS the cluster analysis output includes an agglomerations schedule,
> >>which details the stages when cases are joined.
> >>
> >>Is it possible to obtain such output when performing cluster analysis in
> >>R?  If so, I'd appreciate advice regarding how to obtain this information.
> >
> >
> >If you are talking about hierarchical clustering via hclust(), see ?hclust
> >It tells you that the relevant information is available inside the
> >object and you can even see it via the plot method.
> >
> >Uwe Ligges
> >
> >
> >
> >>
> >>Any assistance is appreciated,
> >>
> >>Regards
> >>
> >>Bob
> >>
> >>______________________________________________
> >>R-help at r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list