An idea for something better than read.table

Kjetil Halvorsen khal@alumni.uv.es
Wed, 24 Feb 1999 21:54:47 +0100


It is nice if data files can have formats not to heavily
dependent on the package.  What I do to read in data is
having data (whith header) in, say, data.dat, and then data.R
with the commands for defining factors, levels, contrast or
whatever. That seems cleaner than mixing in one file data and
definitions.

Kjetil Halvorsen


Peter Dalgaard BSA wrote:
> 
> I was recently converting some datasets for use in an R package and it
> occurred to me that there really is no "neat" way to input a data
> frame if it is to contain factor variables.
> 
> One can use dput()/source or dump() after massaging data into the
> right format, of course, but there isn't really anything which allows
> you to store the input instructions with the data beyond the simple
> header=T type format.
> 
> So I thought of ways to enhance the header. The best idea I've been
> able to come up with this far is to
> 
> (a) Write a function - basically an extension of scan() - which allows
>     you to specify the column data type in more detail. Let's call it
>     data.file() for now. It would pretty much have to deparse all of
>     its arguments and interpret things in slightly unusual ways, but R
>     can do that, and some of functions (notably help() and data())
>     already play this kind of game with the parser...
> 
> (b) Have a function, say read(), which parses the 1st expression in a
>     file and executes it *with the remainder of the file as the
>     argument*. (Currently, this is impossible, but it would be if
>     one just kept track of the line number while parsing. parse()
>     could stick it on as an attribute of the parsed expression list if
>     asked to do so.)
> 
> This would make a file format something like the following possible.
> 
> [There's another loose idea in there involving a control item to handle
> separators, na.strings, etc. - the intention being that read() plugs
> in the file= and skip= arguments for the actual call.]
> 
> Would this be an approach worth pursuing?
> 
> --- Top of file ---
> data.file(control(sep="w",na="."),
>         Item = factor(levels=1:4,labels=c("A","B","C","D")),
>         Size = numeric(),
>         Year = factor(levels=1980:1985)
> )
> 1       0     1980
> 1       10    1981
> 1       14    1982
> 1       20    1983
> 1       25    1984
> 1       30    1985
> 2       0     1980
> 2       5     1981
> 2       6     1982
> 2       8     1984
> 3       0     1984
> 3       2     1985
> 4       0     1980
> 4       20    1981
> 4       30    1982
> 4       30    1984
> 4       35    1985
> --- End of file ---
> 
> --
>    O__  ---- Peter Dalgaard             Blegdamsvej 3
>   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
>  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._