[R] loop over large dataset

Federico Calboli f.calboli at imperial.ac.uk
Mon Jul 4 16:22:37 CEST 2005


On 4 Jul 2005, at 15:15, Peter Dalgaard wrote:
>
> Your original code got lost in the threading, but that order of
> magnitude suggests that you have N^2/2 behaviour somewhere. The  
> typical
> culprit is code like
>
> x <- numeric(0)
> for (i in 1:N){
>   newx <- <<....>>
>   x <- c(x, newx)
> }
>
> in which the extension of x causes the whole thing to be reallocated
> and copied. Same thing with cbind and rbind constructs of course.


I changed my code a bit, and now the runtime is dow to less than a  
minute (from more than 24 hours). I was copying a large dataset many  
times over, when I extracted the columns I need as independet vectors  
runtime dropped like a stone.

Cheers,

Federico

--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com




More information about the R-help mailing list