[R] loop over large dataset
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Mon Jul 4 16:15:27 CEST 2005
Federico Calboli <f.calboli at imperial.ac.uk> writes:
> > behaviour, e.g. because gc() is called more frequently. And of
> > course, gc() needs some time, hence you get the expected increase
> > in runtime. This answers you other question as well.
>
> Is then internal gc() calls that increase the runtime from 5 minutes
> to more then 24 hours for a 27x increase in data (given that the code
> is exactely the same)?
Your original code got lost in the threading, but that order of
magnitude suggests that you have N^2/2 behaviour somewhere. The typical
culprit is code like
x <- numeric(0)
for (i in 1:N){
newx <- <<....>>
x <- c(x, newx)
}
in which the extension of x causes the whole thing to be reallocated
and copied. Same thing with cbind and rbind constructs of course.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list