R devel question
Peter Dalgaard BSA
p.dalgaard@biostat.ku.dk
12 Oct 1998 13:37:04 +0200
David Mosberger <david_mosberger@hp.com> writes:
> Hi Peter,
>
> Hope you don't mind my asking you this directly (I wasn't sure if it
> would be appropriate to post this question to R-devel;
It would.
> if you think it
> is, please feel free to forward it).
Done.
> I'm wondering whether there are any plans to extend R so it could
> handle arbitrarily large objects. For the particular application I
> have in mind, the objects would be several hundred MB in size (in
> uncompressed form).
Some ideas involving "virtual objects" and database interfaces have
been vented on various occasions, but there are no immediate plans. It
*should* happen at some point, I think, but just now we're busy trying
to get the documentation in sync with the implementation and getting
rid of bugs.
> One way I think this could be handled is to leave the basic data types
> presented by R to the user unchanged, but to offer different
> implementation choices for those user data types. For example, an
> "array of structures" is presently loaded into memory in its entirety
> when accessed. For large data objects, this isn't ideal. An
> alternative implementation would be to store such a big array on disk
> and load into memory only the parts that are really needed. In other
> words, the incore representation would simply be a cache of the entire
> data object. Of course, once this is done you could also vary the
> external representation of objects. For example, instead of storing
> each array element next to each other, it often could be advantageous
> to store the fields of the array next to each other (so that
> operations like "compute the average of the .age field" could be
> performed efficiently). Yet another variation might be to add
> on-the-fly compression/decompression to minimize the size of the
> external data file.
>
> If this approach were taken, I'd imagine that R would continue to use
> the "store entirely in memory" approach by default to maintain
> backwards compatibility. At the same time, a few new functions could
> be introduced that would allow precise control over how the object is
> implemented. So when the user wants to deal with a large object, it
> would create the object, set its implementation to something suitable
> (e.g., cache-only, field-sequential layout, on-the-fly compression)
> and then continue to use the object as usual.
>
> Since I'm not familiar with the internals of R, I have no idea how
> easy/hard this would be and I'd therefore appreciate hearing your
> opinion on whether you think this would be a valuable and doable
> extension.
>
> In any case, thanks for working on R! I was excited to find that I
> now have to option to use the S language on my Linux systems!
>
> Cheers,
>
> --david
>
> --
> David Mosberger, Ph.D; HP Labs; 1501 Page Mill Rd MS 1U17; Palo Alto, CA 94304
> davidm@hpl.hp.com voice (650) 236-2575 fax 857-5100
>
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._