[Rd] arbitrary size data frame or other stcucts, curious about issues invovled.

Jay Emerson jayemerson at gmail.com
Mon Jun 20 21:12:56 CEST 2011


Neither bigmemory nor ff are "drop in" solutions -- though useful,
they are primarily for data storage and management and allowing
convenient access to subsets of the data.  Direct analysis of the full
objects via most R functions is not possible.  There are many issues
that could be discussed here (and have, previously), including the use
of 32-bit integer indexing.  There is a nice section "Future
Directions" in the R Internals manual that you might want to look at.


-------------------------------------  Original message:

We keep getting questions on r-help about memory limits  and
I was curious to know what issues are involved in making
common classes like dataframe work with disk and intelligent
swapping? That is, sure you can always rely on OS for VM
but in theory it should be possible to make a data structure
that somehow knows what pieces you will access next and
can keep thos somewhere fast. Now of course algorithms
"should" act locally and be block oriented but in any case
could communicate with data structures on upcoming
access patterns, see a few ms into the future and have the
right stuff prefetched.

I think things like "bigmemory" exist but perhaps one
issue was that this could not just drop in for data.frame
or does it already solve all the problems?

Is memory management just a non-issue or is there something
that needs to be done  to make large data structures work well?

John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University

More information about the R-devel mailing list