Representation of data in libraries
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Wed, 25 Feb 1998 08:34:03 +0100
>>>>> "DougB" == Douglas Bates <bates@stat.wisc.edu> writes:
DougB> At present the example data sets in R libraries are to be given as
DougB> expressions that can be read directly into R. For example, the acid.R
DougB> file in the main library looks like
DougB> acid <- data.frame(
DougB> carb = c(0.1, 0.3, 0.5, 0.6, 0.7, 0.9),
DougB> optden = c(0.086, 0.269, 0.446, 0.538, 0.626, 0.782), row.names = paste(1:6))
DougB> This is great when you have only a few observations. I have one
DougB> example data set with over 9000 rows and 17 variables. Even when I
DougB> set -v 40, I exhaust the available memory trying to read it in as a
DougB> data.frame. I believe this is because of the recursive nature of the
DougB> parsing of data objects.
yes;
DougB> Are there alternatives that would cause less memory usage?
yes; but only in the 0.62 development version.
The current 0.62 ``standard'' is:
if a 'data' file ends in
.R, source(.) is used to read it
if it ends in
.tab read.table(..., header = TRUE) is used to read it.
(you find the new data(.) function in src/library/base/data in R-snapshot.)
Note that this is still not really satisfactory for large data files,
since read.table(.) is not really efficient:
it first reads everything as character matrix and then converts
variable by variable, some to numeric, some to factor.
On the other hand: does it really make sense to distribute huge example
data sets as yours above?
If yes, AND if you have only numeric data,
I'd propose the following:
1) create a <pkg>/data/dougBex.R
file which only contains something like
dougBex <- as.data.frame(
matrix(scan(system.file("<pkg>/data/dougBex.dat")),
ncol = ...,
dimnames = ...))
2) create <pkg>/data/dougBex.dat to contain all your data, white-space
delimited numeric.
DougB> In S/S-PLUS the data.dump/data.restore functions use a portable
DougB> representation that can be parsed without exponential memory growth.
hmm, yes, we have been longing for someone to write data.dump/data.restore
for R.
Any volunteers?
--
Martin Maechler <maechler@stat.math.ethz.ch> <><
Seminar fuer Statistik, ETH-Zentrum SOL G1; Sonneggstr.33
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1086
http://www.stat.math.ethz.ch/~maechler/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._