[R] clara - memory limit
    Prof Brian Ripley 
    ripley at stats.ox.ac.uk
       
    Wed Aug  3 19:18:27 CEST 2005
    
    
  
>From the help page:
      'clara' is fully described in chapter 3 of Kaufman and Rousseeuw
      (1990). Compared to other partitioning methods such as 'pam', it
      can deal with much larger datasets.  Internally, this is achieved
      by considering sub-datasets of fixed size ('sampsize') such that
      the time and storage requirements become linear in n rather than
      quadratic.
and the default for 'sampsize' is apparently at least nrow(x).
So you need to set 'sampsize' (and perhaps 'samples') appropriately,
On Wed, 3 Aug 2005, Nestor Fernandez wrote:
> Dear all,
>
> I'm trying to estimate clusters from a very large dataset using clara but the
> program stops with a memory error. The (very simple) code and the error:
>
> mydata<-read.dbf(file="fnorsel_4px.dbf")
> my.clara.7k<-clara(mydata,k=7)
>
>> Error: cannot allocate vector of size 465108 Kb
>
> The dataset contains >3,000,000 rows and 15 columns. I'm using a windows
> computer with 1.5G RAM; I also tried changing the memory limit to the maximum
> possible (4000M)
Actually, the limit is probably 2048M: see the rw-FAQ Q on memory limits.
> Is there a way to calculate clara clusters from such large datasets?
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
    
    
More information about the R-help
mailing list