[R] loop over large dataset
Federico Calboli
f.calboli at imperial.ac.uk
Mon Jul 4 16:22:37 CEST 2005
On 4 Jul 2005, at 15:15, Peter Dalgaard wrote:
>
> Your original code got lost in the threading, but that order of
> magnitude suggests that you have N^2/2 behaviour somewhere. The
> typical
> culprit is code like
>
> x <- numeric(0)
> for (i in 1:N){
> newx <- <<....>>
> x <- c(x, newx)
> }
>
> in which the extension of x causes the whole thing to be reallocated
> and copied. Same thing with cbind and rbind constructs of course.
I changed my code a bit, and now the runtime is dow to less than a
minute (from more than 24 hours). I was copying a large dataset many
times over, when I extracted the columns I need as independet vectors
runtime dropped like a stone.
Cheers,
Federico
--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 75941602 Fax +44 (0)20 75943193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
More information about the R-help
mailing list