[R-pkgs] rZeppelin: An R notebook that makes Spark easy to use

Amos B. Elberg amos.elberg at gmail.com
Mon Mar 7 20:33:40 CET 2016


rZeppelin is an R interpreter for Apache (incubating) Zeppelin.  Zeppelin
is a notebook, sort of like iPython, built on top of Apache Spark.

rZeppelin makes it possible, for the first time, to create a single data/ML
pipeline that mixes R, scala, and Python code, seamlessly, from a single
interface.  (Without breaking lazy evaluation!)

For R-using data scientists, this means that you can access the full power
of Spark — including ultra-fast distributed implementations of popular
algorithms — using R, without having to learn scala, without a dedicated
administrator to manage a Spark or Hadoop cluster, and without spending
more than minimal time to review the SparkR api.

You can load text data using R, quickly create an LDA model using Spark’s
distributed LDA package, tag the text using gensim from Python, and then
visualize and take further steps from R, from a single session using a
single interface.

The full range of Spark packages, including MLLIB and GraphX, which used to
require scala development, can be used in the same pipeline with R.
(Except Spark Streaming, which Zeppelin doesn’t yet support.)

Beyond Spark, R data can be visualized using Zeppelin’s built-in
interactive visualizations.  rZeppelin also leverages knitr to make
available most R visualization and interactive visualization packages.

Many data types are also easily moved between R, scala and Python:  the
languages share a ZeppelinContext, where variables can be added and
extracted with .z.put() and .z.get().

rZeppelin is intended to make Spark part of the R data scientist’s daily
toolbox.

rZeppelin is available here:  https://github.com/elbamos/Zeppelin-With-R

	[[alternative HTML version deleted]]



More information about the R-packages mailing list