[R-sig-hpc] snow and foreach memory issue?

Sun Dec 13 19:47:22 CET 2009

Martin Morgan wrote:
> Zhang, Ivan wrote:
>> Hi everyone,
>>
>> I have a question regarding ram usage of snow and foreach. 
>> I am running windows XP with 4 gb ram and intel quadcore 2.66.
>>
>> I recently tried to implement multicore processing using 'multicore' and
>> 'foreach' until I realized that multicore didn't work well on Windows
>> and switched to using 'snow' and 'foreach' which works nicely.
>>
>> I hashed out my own method without dopar however, for some reason I was
>> eating up my ram really quickly.
>> I wanted to see if anyone can figure out why perhaps there is something
>> I don't understand about snow as I just recently started using it.
>>
>> Suppose X_mat is a series of regressors for Y.
>>
>> Datapoints is a very large matrix of points.
>>
>> The pseudo code is as follows: 
>> someFunc = function(cl,...) {
>>
>> for ( i in 1:n) {
>>     model = generateModel(Y[,i],X_mat)
>>     dp=iter(datapoints, by='row',
>> chunksize=floor(nrow(datapoints)/NUMCORES))
>>
>>     assign("model", model, .GlobalEnv) 		#couldn't figure out how
>> to get cluster export to work within a function.
>>     clusterExport(cl, "model") 			#this only reads from
>> globalenv?
>>
>>     pred <- do.call(c, clusterApply(cl, as.list(dp), function(x)
> 
> as.list(iter(datapoints, <etc>)) makes a full copy of datapoints.
> clusterApply creates another copy, distributed across cores. So if
> they're on the same machine you now use 3 * sizeof(datapoints) memory,
> even before any calculations done on the workers. Ouch.
> 
> A first approach might iter(datapoints, chunksize=nrow(datapoints) / N)
> coupled with clusterApplyLB -- there will be N chunks (assuming N >
> NUMCORES), and clusterApplyLB will only ever have in play a portion
> NUMCORES / N of datapoints (clusterApply would divide the N chunks into
> two groups, again forwarding the entire data to the workers!), so the
> memory use will be (2 + NUMCORES / N) * sizeof(datapoints).
> 
> A better approach is to avoid the duplication implied by
> as.list(iter(<etc>)). This would require an implementation like
> snow::dynamicClusterApply, where the first NUMCORES chunks of iter() are
> forwarded to the workers, and then a loop is entered where the manager
> receives one result and forwards the next chunk to the worker that
> provided the result. Memory use would then be (1 + NUMCORES / N) *
> sizeof(datapoints). Presumably this is the strategy taken by %dopar%.
> 
> multicore should be the winner here, though, since all workers should
> have access to the data without copying -- sizeof(datapoints) memory
> use. I haven't used multicore extensively, and especially not on
> windows. When you say "it didn't work well" it would be helpful to
> understand why. My limited experimentation suggested no problems when
> used with data sets that were not too close to the windows memory
> limits. Perhaps you are really just running out of memory, and multicore
> is not reporting this as nicely as it could? I'm sure the multicore
> author would appreciate something more precise in terms of user experience.
> 
> A final consideration is that calculations on the workers are likely to
> duplicate a subset of datapoints, so that actual memory use will include
> an additional component that scales approximately linearly with
> NUMCORES. If the worker computations are memory intensive, then you'll
> quickly find yourself in trouble again.
> 
> Hope that helps,
> 
> Martin
> 
> 
>> predict(model, x))
>> 	
>> 	...
>> }
>>
>> }
>>
>> When I ran this code, my ram would go up from the initial 2.5
>> incrementally up 500 mb after each run until it ate up 4 GB. So for
>> n>=3, the computer would throttle.

I guess my previous response doesn't really address this pattern; it
suggests that something is being retained across iterations of the for()
loop, perhaps because in the code you don't show us (...) you retain a
reference to a 500mb object (iter(datapoints)? pred?) that you don't
retain in the code below.

Martin

>> When I found out about the new registerDoSnow, it improved my
>> performance (props to Stephen Weston) here's the pseudo code for the
>> equivalent above.
>>
>> someFunc = function(cl,...) {
>> registerDoSnow(cl)
>> for ( i in 1:n) {
>>     	model = generateModel(Y[,i],X_mat)
>> 	pred <- foreach(dp=iter(datapoints, by='row',
>> chunksize=floor(nrow(datapoints)/NUMCORES)), .combine=c, .verbose=TRUE)
>> %dopar% { predict(Tail.lo,dp) })}
>> }
>>
>> Aside from slightly shorter lines, the performance was more stable, the
>> ram ran from 2.5 up to 3.2 GB ish, and stayed stable, and performed
>> better because it wouldn't run out of cache.
>>
>> However, I just want to understand what is the difference between the
>> two treatments so that it would make such a large difference and whether
>> I am doing something wrong in the first example.
>>
>> Thanks,
>>
>> -Ivan
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
> 

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793