[R-sig-hpc] snow and foreach memory issue?

Zhang, Ivan ivan.zhang at bankofamerica.com
Fri Dec 11 20:35:14 CET 2009


Hi everyone,

I have a question regarding ram usage of snow and foreach. 
I am running windows XP with 4 gb ram and intel quadcore 2.66.

I recently tried to implement multicore processing using 'multicore' and
'foreach' until I realized that multicore didn't work well on Windows
and switched to using 'snow' and 'foreach' which works nicely.

I hashed out my own method without dopar however, for some reason I was
eating up my ram really quickly.
I wanted to see if anyone can figure out why perhaps there is something
I don't understand about snow as I just recently started using it.

Suppose X_mat is a series of regressors for Y.

Datapoints is a very large matrix of points.

The pseudo code is as follows: 
someFunc = function(cl,...) {

for ( i in 1:n) {
    model = generateModel(Y[,i],X_mat)
    dp=iter(datapoints, by='row',
chunksize=floor(nrow(datapoints)/NUMCORES))

    assign("model", model, .GlobalEnv) 		#couldn't figure out how
to get cluster export to work within a function.
    clusterExport(cl, "model") 			#this only reads from
globalenv?

    pred <- do.call(c, clusterApply(cl, as.list(dp), function(x)
predict(model, x))
	
	...
}

}

When I ran this code, my ram would go up from the initial 2.5
incrementally up 500 mb after each run until it ate up 4 GB. So for
n>=3, the computer would throttle.

When I found out about the new registerDoSnow, it improved my
performance (props to Stephen Weston) here's the pseudo code for the
equivalent above.

someFunc = function(cl,...) {
registerDoSnow(cl)
for ( i in 1:n) {
    	model = generateModel(Y[,i],X_mat)
	pred <- foreach(dp=iter(datapoints, by='row',
chunksize=floor(nrow(datapoints)/NUMCORES)), .combine=c, .verbose=TRUE)
%dopar% { predict(Tail.lo,dp) })}
}

Aside from slightly shorter lines, the performance was more stable, the
ram ran from 2.5 up to 3.2 GB ish, and stayed stable, and performed
better because it wouldn't run out of cache.

However, I just want to understand what is the difference between the
two treatments so that it would make such a large difference and whether
I am doing something wrong in the first example.

Thanks,

-Ivan



More information about the R-sig-hpc mailing list