[R-sig-hpc] snow and foreach memory issue?

Mon Dec 14 16:15:46 CET 2009

It looks to me like the matrix datapoints might be getting captured in
the function closure that you're passing to clusterApply.  That depends
on where datapoints is defined, which you don't show. That could
cause serious performance and memory problems.

You can fix that by either defining the function from the
global environment, or by removing the enclosing environment
after you define the function, as follows:

    fun <- function(x) predict(model, x)
    environment(fun) <- globalenv()

You should still be able to access "model", since you've exported it
to the global environment of the workers.

Actually, it looks to me like "model" would be captured in the
closure, as well.  That could be another problem.

Also, for the snow version, I would use the splitRows function to
split datapoints.  There isn't really any point in creating the
iterator, just to turn it back into a list since splitRows was designed
for that purpose.  You might also want to try using the snow
parRapply function.  But you still to address the capture of
datapoints and model if that is indeed happening.

- Steve

On Sun, Dec 13, 2009 at 1:47 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> Martin Morgan wrote:
>> Zhang, Ivan wrote:
>>> Hi everyone,
>>>
>>> I have a question regarding ram usage of snow and foreach.
>>> I am running windows XP with 4 gb ram and intel quadcore 2.66.
>>>
>>> I recently tried to implement multicore processing using 'multicore' and
>>> 'foreach' until I realized that multicore didn't work well on Windows
>>> and switched to using 'snow' and 'foreach' which works nicely.
>>>
>>> I hashed out my own method without dopar however, for some reason I was
>>> eating up my ram really quickly.
>>> I wanted to see if anyone can figure out why perhaps there is something
>>> I don't understand about snow as I just recently started using it.
>>>
>>> Suppose X_mat is a series of regressors for Y.
>>>
>>> Datapoints is a very large matrix of points.
>>>
>>> The pseudo code is as follows:
>>> someFunc = function(cl,...) {
>>>
>>> for ( i in 1:n) {
>>>     model = generateModel(Y[,i],X_mat)
>>>     dp=iter(datapoints, by='row',
>>> chunksize=floor(nrow(datapoints)/NUMCORES))
>>>
>>>     assign("model", model, .GlobalEnv)               #couldn't figure out how
>>> to get cluster export to work within a function.
>>>     clusterExport(cl, "model")                       #this only reads from
>>> globalenv?
>>>
>>>     pred <- do.call(c, clusterApply(cl, as.list(dp), function(x)
>>
>> as.list(iter(datapoints, <etc>)) makes a full copy of datapoints.
>> clusterApply creates another copy, distributed across cores. So if
>> they're on the same machine you now use 3 * sizeof(datapoints) memory,
>> even before any calculations done on the workers. Ouch.
>>
>> A first approach might iter(datapoints, chunksize=nrow(datapoints) / N)
>> coupled with clusterApplyLB -- there will be N chunks (assuming N >
>> NUMCORES), and clusterApplyLB will only ever have in play a portion
>> NUMCORES / N of datapoints (clusterApply would divide the N chunks into
>> two groups, again forwarding the entire data to the workers!), so the
>> memory use will be (2 + NUMCORES / N) * sizeof(datapoints).
>>
>> A better approach is to avoid the duplication implied by
>> as.list(iter(<etc>)). This would require an implementation like
>> snow::dynamicClusterApply, where the first NUMCORES chunks of iter() are
>> forwarded to the workers, and then a loop is entered where the manager
>> receives one result and forwards the next chunk to the worker that
>> provided the result. Memory use would then be (1 + NUMCORES / N) *
>> sizeof(datapoints). Presumably this is the strategy taken by %dopar%.
>>
>> multicore should be the winner here, though, since all workers should
>> have access to the data without copying -- sizeof(datapoints) memory
>> use. I haven't used multicore extensively, and especially not on
>> windows. When you say "it didn't work well" it would be helpful to
>> understand why. My limited experimentation suggested no problems when
>> used with data sets that were not too close to the windows memory
>> limits. Perhaps you are really just running out of memory, and multicore
>> is not reporting this as nicely as it could? I'm sure the multicore
>> author would appreciate something more precise in terms of user experience.
>>
>> A final consideration is that calculations on the workers are likely to
>> duplicate a subset of datapoints, so that actual memory use will include
>> an additional component that scales approximately linearly with
>> NUMCORES. If the worker computations are memory intensive, then you'll
>> quickly find yourself in trouble again.
>>
>> Hope that helps,
>>
>> Martin
>>
>>
>>> predict(model, x))
>>>
>>>      ...
>>> }
>>>
>>> }
>>>
>>> When I ran this code, my ram would go up from the initial 2.5
>>> incrementally up 500 mb after each run until it ate up 4 GB. So for
>>> n>=3, the computer would throttle.
>
> I guess my previous response doesn't really address this pattern; it
> suggests that something is being retained across iterations of the for()
> loop, perhaps because in the code you don't show us (...) you retain a
> reference to a 500mb object (iter(datapoints)? pred?) that you don't
> retain in the code below.
>
> Martin
>
>>> When I found out about the new registerDoSnow, it improved my
>>> performance (props to Stephen Weston) here's the pseudo code for the
>>> equivalent above.
>>>
>>> someFunc = function(cl,...) {
>>> registerDoSnow(cl)
>>> for ( i in 1:n) {
>>>      model = generateModel(Y[,i],X_mat)
>>>      pred <- foreach(dp=iter(datapoints, by='row',
>>> chunksize=floor(nrow(datapoints)/NUMCORES)), .combine=c, .verbose=TRUE)
>>> %dopar% { predict(Tail.lo,dp) })}
>>> }
>>>
>>> Aside from slightly shorter lines, the performance was more stable, the
>>> ram ran from 2.5 up to 3.2 GB ish, and stayed stable, and performed
>>> better because it wouldn't run out of cache.
>>>
>>> However, I just want to understand what is the difference between the
>>> two treatments so that it would make such a large difference and whether
>>> I am doing something wrong in the first example.
>>>
>>> Thanks,
>>>
>>> -Ivan
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>