[R] agnes clustering and NAs

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Jan 27 13:53:27 CET 2011


On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
> 
> On 27.01.2011 05:00, Dario Strbenac wrote:
> > Hello,
> >
> > In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like :
> >
> >> m<- matrix(c(
> > 1, 1, 1, 2,
> > 1, NA, 1, 1,
> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
> >> agnes(m)
> > Call:    agnes(x = m)
> > Agglomerative coefficient:  0.1614168
> > Order of objects:
> > [1] 1 2 3
> > Height (summary):
> >     Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >    1.155   1.247   1.339   1.339   1.431   1.524
> >
> > Available components:
> > [1] "order"  "height" "ac"     "merge"  "diss"   "call"   "method" "data"
> >
> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it runs for about a minute, then gives an error :
> >
> >> agnes(iMatrix)
> > Error in agnes(iMatrix) :
> >    No clustering performed, NA-values in the dissimilarity matrix.
> >
> > I've also tried getting rid of rows with all NAs in them, and it still gave me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim made by its documentation.
> 
> 
> I haven't looked in the file, but you need to get rid of all NA, or in 
> other words, all rows that contain *any* NA values.

If one believes the documentation, then that only applies to the case
where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
data matrix or data frame.

The only way the OP could have gotten that error with the call shown is
if iMatrix were not a dissimilarity matrix inheriting from class "dist",
so `NA`s should be allowed.

My guess would be that the OP didn't get rid of all the `NA`s.

Dario: what does:

sapply(iMatrix, function(x) any(is.na(x)))

or if iMatrix is a matrix:

apply(iMatrix, 2, function(x) any(is.na(x)))

say?

G

> Uwe Ligges
> 
> 
> 
> > The matrix I'm using can be obtained here :
> > http://129.94.136.7/file_dump/dario/iMatrix.obj
> >
> > --------------------------------------
> > Dario Strbenac
> > Research Assistant
> > Cancer Epigenetics
> > Garvan Institute of Medical Research
> > Darlinghurst NSW 2010
> > Australia
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list