[R] Enquiry about Hierarchical Clustering

Adaikalavan RAMASAMY ramasamya at gis.a-star.edu.sg
Sat Sep 27 11:05:43 CEST 2003


Hclust is unable to handle missing values in dist().

There will be missing values in dist() function if 
1. all elements in a row are missing
2. all pairs between any two rows have at least one missing values.

In the former case, it is better to remove the row with all missing as
it is completely uninformative. The latter is harder to detect and I am
not sure how to deal with this.

Here is how dist() calculates its output for the following data:

   NA    3    5
    2    4    6

dist( rbind( c(NA, 3, 5) , c(2,4,6) ) ) = 1.732051 
= sqrt( [ (6-5)^2 + (4-3)^2  ] x 3/2 )

The factor 3/2 scales up the sum of squares of difference to account for
the missing pair.

Hope this helps.

--
Adaikalavan Ramasamy 



> Dear Sir,
> 
> This is Ms. Setsuko Kinoshita writing from Japan.
> 
> I have a question about " missing value" in Hierarchical Clustering. 
> Hierarchical Clustering was not available the data with missing value 
> for earlier version of "R". I used Euclidean distance and complete 
> linkage method for "plot(hclust(dist()),hang=-1)".
> 
> How are missing values treated for Hierarchical Clustering in the 
> latest "R 1.7.1" program? e.g. : Is an average replaced ?
> 
> Yours Sincerely,
> 
> -----
> Setsuko Kinoshita
> 
> Social $B!! (Band Environmental Medicine, $B!! (B
> Graduate School of Comprehensive Human Sciences,
> University of Tsukuba
> 1-1-1, Tennoudai, Tsukuba,
> Ibaraki, 305-8575, Japan
> Tel&Fax: +81-29-853-3489
> E-mail:setsuko at epidemiology.md.tsukuba.ac.jp(office)
> E-mail:setsuko at mbj.ocn.ne.jp(private)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list