[R] R and Hadoop Integrated Processing Environment - RHIPE
Saptarshi Guha
saptarshi.guha at gmail.com
Sat Jan 24 19:08:55 CET 2009
Hello,
We have created an interface between R and Hadoop so that the user
can, after a fashion, interact with very large datasets
using the Map Reduce programming model. We also use IBM's TSpaces to
implement a shared memory implementation that can be
accessed via R(somewhat like networkspaces). RHIPE uses Rserve to
execute R code.
Some of the functions implemented are:
mrlapply - run lapply across a Hadoop cluster
mrsubsetf - subset a file according to an R function
mtapplyf - run a tapply on a file -
mrmapreduce - run a map reduce algorithm on a file or group of files.
The user provides a mapper and reducer.
The are also some shared memory operations such as mrread,mrtake,mrput.
Currently, it is at a proof of concept stage and much work is required
before it is production ready. However, for the adventurous, it is
possible to use it to process large data.
For more information and examples please visit this page: http://www.stat.purdue.edu/~sguha/rhipe
.
If anyone would like to contribute to this project, please email me
directly - any help is welcome.
Regards
Saptarshi Guha
More information about the R-help
mailing list