question about Rpvm, SNOW, etc.

Liaw, Andy andy_liaw@merck.com
Wed, 21 Aug 2002 09:33:58 -0400


> From: Michael Na Li [mailto:lina@u.washington.edu]
> Andy> 2.  I was hoping I can see similar gain with 
> randomForest, but that
> Andy> doesn't seem to be the case:
> 
> >> system.time(iris.rf <- randomForest(iris[,1:4], iris[,5], 
> ntree=10000))
> Andy> [1] 8.52 1.00 9.61 0.00 0.00
> >> system.time(cl.iris.rf <- clusterCall(cl, randomForest, iris[,1:4],
> Andy> +                                       iris[,5], ntree=5000))
> Andy> [1]  1.38  0.14 15.50  0.00  0.00
> 
> Andy> What am I missing here?  Is there anything I can do to 
> see similar gain
> Andy> as the boot() example?
> 
> I tried this example and found that most of the extra time is 
> overhead, the
> packing and unpacking of the messages.  When saving the 
> object iris.rf, its
> size is over 12M.  So it might be desirable to process the 
> returned result in
> each slave first and only return information needed.  
> 
> I got similar timing with our cluster.  Saving and loading 
> the object to/from
> a file require about 1.5 seconds each, which I assume is the 
> cost of the
> serialization (plus file reading and writing).  Then it seems 
> the packing (as
> bytes), transferring, and unpacking the object take 7-8 seconds??

I tried the following, and it seems about right.  (I used a socket cluster
with 3 nodes.  PVM cluster gave similar result.)

system.time(iris.rf <- randomForest(iris[,1:4], iris[,5],
ntree=30000)$votes)
[1] 24.57  3.33 27.98  0.00  0.00

system.time(clusterEvalQ(cl, {data(iris); randomForest(iris[, 1:4],
                                      iris[, 5], ntree=10000)$votes}))
[1]  0.01  0.00 12.70  0.00  0.00

I was planning to add an alternative random forest interface that does not
return the entire forest (to save memory).  Seems like that item just got
promoted on my to-do list...

Thanks very much for the help!!

Andy

> I wonder how much the serialization itself hurts the 
> performance.  Would
> sending raw numbers with pvm routines improve the performance?
> 
> Michael
> 
> (BTW, is there a convenient function in R to examine the size 
> of an object?)
> 
> -- 
> ---------------------------------------------------
> Michael Na Li
> Email: lina@u.washington.edu
> Department of Biostatistics, Box 357232
> University of Washington, Seattle, WA 98195  
> ---------------------------------------------------
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: 
> r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> 

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._