[Rd] How to efficiently share data (a dataframe) between R and Java
Simon Urbanek
simon.urbanek at r-project.org
Mon Dec 7 03:19:38 CET 2015
On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kuchař <jaroslav.kuchar at fit.cvut.cz> wrote:
> Dear all,
>
> in our ongoing project we use Java implementations of several
> algorithms. We also provide a “wrapper” implemented as an R package
> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our
> recent experiments, the significant portion of time is spent on copying
> a dataframe from R to Java. The Java implementation needs access to the
> source dataframe.
>
> I have tested several approaches: calling Java method row-by-row;
> serialize the whole data-frame to a temp file and parsing in Java; or
> row binding to a single vector and calling a single Java method. Each
> approach has its limitations e.g. time-consuming row-by-row copying,
> serialization and parsing performance or memory limitations of a single
> vector.
>
> Is there an efficient approach how to copy a dataframe from R to Java
> and another one from Java to R?
>
> Thanks for any help you can provide...
>
You can natively access structures on each side. The fastest way is to use R representation (column-oriented) in Java - that is much faster than any kind of serialization or anything you mention above since you pass the variables as a whole.
Typically, the bottleneck are Java applications which may require very inefficient data structures. If you have control over the algorithms, you can simply use proper data structures and avoid that problem. If you don't have control, you'll have to add Java code that converts to whatever structure is needed by the Java code form the data frame pushed to the Java side. The main point here is that you do NOT want to do any conversion on the R side.
Cheers,
Šimon
More information about the R-devel
mailing list