[Rd] Suggestion: Create On-Disk Dataframes; SparkR

frederik at ofb.net frederik at ofb.net
Mon Sep 4 21:48:11 CEST 2017


What's wrong with SparkR? I never heard of either Spark or SparkR.

For on-disk dataframes there is a package called 'ff'. I looked into
using it, it works well but there are some drawbacks with the
implementation. I think that it should be possible to mmap an object
from disk and use it as a vector, but 'ff' is doing something else:

https://github.com/edwindj/ffbase/issues/52

I think you'd need something called a "weak reference" to do this
properly:

http://homepage.divms.uiowa.edu/~luke/R/references/weakfinex.html

I don't know what SparkR is doing under the hood.

Then again I was mostly interested in having large data sets which
persist across R sessions, while Juan seems to be interested in
supporting data which doesn't fit in RAM. But if something doesn't fit
in RAM, it can be swapped out to disk by the OS, no? So I'm not sure
why you'd want a special interface for that situation, aside from
giving the programmer more control.

Thanks,

Frederick

On Mon, Sep 04, 2017 at 07:43:50AM -0500, Dirk Eddelbuettel wrote:
> 
> On 4 September 2017 at 11:35, Suzen, Mehmet wrote:
> | It is not needed. There is a large community of developer using SparkR.
> | https://spark.apache.org/docs/latest/sparkr.html
> | It does exactly what you want.
> 
> I hope you are not going to mail a sparkr commercial to this list every day.
> As the count is now at two, this may be an excellent good time to stop it.
> 
> Dirk
> 
> -- 
> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list