[R] Cluster analysis with missing data

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Jul 14 10:41:03 CEST 2009


On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:
> Hi folks,
> 
> I tried for the first time hclust. Unfortunately, with missing data in my
> data file, it doesn't seem
> to work. I found no information about how to consider missing data.
> 
> Omission of all missings is not really an option as I would loose to many
> cases.

Holger,

hclust takes a dissimilarity matrix as input, not your data, so the
problem is in finding an appropriate dissimilarity/distance coefficient
that handles missing data.

Once such measure is Gower's coefficient and is implemented in function
'daisy' in recommended package 'cluster'. Try:

require(cluster)
?daisy

to read about it.

Also 'vegdist' in package 'vegan' has an ability to not consider
pairwise missingness. See ?vegdist after loading 'vegan' and in
particular, the 'na.rm' argument.

Whether either of these (i.e. the resulting dissimilarities) make sense
for your particular problem is another matter...

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list