[R] memory management uestion [Broadcast]

Charles C. Berry cberry at tajo.ucsd.edu
Tue Feb 20 18:24:10 CET 2007


On Tue, 20 Feb 2007, Federico Calboli wrote:

> Liaw, Andy wrote:
>>  I don't see why making copies of the columns you need inside the loop is
>>  "better" memory management.  If the data are in a matrix, accessing
>>  elements is quite fast.  If you're worrying about speed of that, do what
>>  Charles suggest: work with the transpose so that you are accessing
>>  elements in the same column in each iteration of the loop.
>
> As I said, this is pretty academic, I am not looking for how to do something 
> differetly.
>
> Having said that, let me present this code:
>
> for(i in gp){
>    new[i,1] = ifelse(srow[i]>0, new[srow[i],zippo[i]], sav[i])
>    new[i,2] = ifelse(drow[i]>0, new[drow[i],zappo[i]], sav[i])
>  }
>
> where gp is large vector and srow and drow are the dummy variables for:
>
> srow = data[,2]
> drow = data[,4]
>
> If instead of the dummy variable I access the array directly (and its' a 
> 600000 x 6 array) the loop takes 2/3 days --not sure here, I killed it after 
> 48 hours.
>
> If I use dummy variables the code runs in 10 minutes-ish.
>
> Comments?


This is a bit different than your original post (where it appeared that 
you were manipulating one row of a matrix at a time), but the issue is the 
same.

As suggested in my earlier email this looks like a caching issue, and this 
is not peculiar to R.

Viz.

"Most modern CPUs are so fast that for most program workloads the locality 
of reference of memory accesses, and the efficiency of the caching and 
memory transfer between different levels of the hierarchy, is the 
practical limitation on processing speed. As a result, the CPU spends much 
of its time idling, waiting for memory I/O to complete."

(from http://en.wikipedia.org/wiki/Memory_hierarchy)


The computation you have is challenging to your cache, and the effect of 
dropping unused columns of your 'data' object by assiging the 
columns used  to 'srow' and 'drow' has lightened the load.

If you do not know why SAXPY and friends are written as they are, a little 
bit of study will be rewarded by a much better understanding of these 
issues. I think Golub and Van Loan's 'Matrix Computations' touches on this 
(but I do not have my copy close to hand to check).


>
> Best,
>
> Fede
>
> -- 
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901



More information about the R-help mailing list