[R] loop over large dataset

Mon Jul 4 16:15:27 CEST 2005

Federico Calboli <f.calboli at imperial.ac.uk> writes:

> > behaviour, e.g. because gc() is called more frequently. And of  
> > course, gc() needs some time, hence you get the expected increase  
> > in runtime. This answers you other question as well.
> 
> Is then internal gc() calls that increase the runtime from 5 minutes  
> to more then 24 hours for a 27x increase in data (given that the code  
> is exactely the same)?

Your original code got lost in the threading, but that order of
magnitude suggests that you have N^2/2 behaviour somewhere. The typical
culprit is code like

x <- numeric(0)
for (i in 1:N){
  newx <- <<....>>
  x <- c(x, newx)
} 

in which the extension of x causes the whole thing to be reallocated
and copied. Same thing with cbind and rbind constructs of course.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907