[R] Pam and Fanny vector length problems
    Martin Maechler 
    maechler at stat.math.ethz.ch
       
    Fri Feb 28 12:15:51 CET 2003
    
    
  
>>>>> "Mark" == Mark Marques <mmarques at power.inescn.pt>
>>>>>     on Fri, 28 Feb 2003 09:51:02 +0000 writes:
    Mark> I have "small" problem ...
    Mark> with the cluster library  each time I try to use
    Mark> the "agnes","pam","fanny" functions with more than 20000 elements
    Mark> I get the following error:
    >> Error in vector("double", length) : negative length vectors are not allowed
    >> In addition: Warning message:
    >> NAs introduced by coercion
"negative" is certainly misleading here; I presume it's an
integer overflow somewhere.
But (with agnes()) I could never get close, even
  a <- agnes(dist(cbind(1,rnorm(5000))))
pumps my R up to a memory footprint of 638 MBytes...
    Mark> But with the clara function everything works fine...
because clara() is for  large applications !!
In clustering, 20000 is definitely "large".
I would recommend to use quite a bit larger `samples' and `sampsize'
than the default in clara().
All routines but clara() work with a dissimilarity/distance object
of size n*(n-1)/2  (basically one the triangles of a symmetric n^2 matrix).
The implementation will need to duplicate these at least, and
one double is 8 bytes.
    Mark> What could be wrong ?
You have no chance of getting anything from agnes() or pam()
when you want to cluster 20'000 objects at least not on 32-bit computers.
It seems though one could carefully change agnes() (e.g.) to use
less duplication of the large objects and save memory..
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
    
    
More information about the R-help
mailing list