[R] Pam and Fanny vector length problems

Martin Maechler maechler at stat.math.ethz.ch
Fri Feb 28 12:15:51 CET 2003


>>>>> "Mark" == Mark Marques <mmarques at power.inescn.pt>
>>>>>     on Fri, 28 Feb 2003 09:51:02 +0000 writes:

    Mark> I have "small" problem ...
    Mark> with the cluster library  each time I try to use
    Mark> the "agnes","pam","fanny" functions with more than 20000 elements
    Mark> I get the following error:
    >> Error in vector("double", length) : negative length vectors are not allowed
    >> In addition: Warning message:
    >> NAs introduced by coercion

"negative" is certainly misleading here; I presume it's an
integer overflow somewhere.
But (with agnes()) I could never get close, even
  a <- agnes(dist(cbind(1,rnorm(5000))))
pumps my R up to a memory footprint of 638 MBytes...

    Mark> But with the clara function everything works fine...

because clara() is for  large applications !!
In clustering, 20000 is definitely "large".
I would recommend to use quite a bit larger `samples' and `sampsize'
than the default in clara().

All routines but clara() work with a dissimilarity/distance object
of size n*(n-1)/2  (basically one the triangles of a symmetric n^2 matrix).
The implementation will need to duplicate these at least, and
one double is 8 bytes.

    Mark> What could be wrong ?

You have no chance of getting anything from agnes() or pam()
when you want to cluster 20'000 objects at least not on 32-bit computers.

It seems though one could carefully change agnes() (e.g.) to use
less duplication of the large objects and save memory..


Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><




More information about the R-help mailing list