[R] Programcode and data in the same textfile
Ernst Hansen
erhansen at math.ku.dk
Thu Jun 12 14:39:35 CEST 2003
I have the following problem. It is not of earthshaking importance,
but still I have spent a considerable amount of time thinking about
it.
PROBLEM: Is there any way I can have a single textfile that contains
both
a) data
b) programcode
The program should act on the data, if the textfile is source()'ed
into R.
BOUNDARY CONDITION: I want the data written in the textfile in exactly
the same format as I would use, if I had data in a separate textfile,
to be read by read.table(). That is, with 'horizontal inhomogeneity'
and 'vertical homogeneity' in the type of entries. I want to write
something like
Sex Respons
Male 1
Male 2
Female 3
Female 4
In effect, I am asking if there is some way I can convince
read.table(), that the data is contained in the following n lines of
text.
ILLEGAL SOLUTIONS:
I know I can simulate the behaviour by reading the columns of the
dataframe one by one, and using data.frame() to glue them together.
Like in
data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
Respons = c(1, 2, 3, 4))
I do not like this solution, because it represents the data in a
"transposed" way in the textfile, and this transposition makes the
structure of the dataframe less transparent - at least to me. It
becomes even less comprehensible if the Sex-factor above is written
with the help of rep() or gl() or the like.
I know I can make read.table() read from stdin, so I could type the
dataframe at the prompt. That is against the spirit of the problem,
as I describe below.
I know I can make read.table() do the job, if I split the data and the
programcode in to different files. But as the purpose of the exercise
is to distribute the data and the code to other people, splitting
into several files is a complication.
MOTIVATION: I frequently find myself distributing small chunks of code
to my students, along with data on which the code can work.
As an example, I might want to demonstrate how model.matrix() treats
interactions, in a certain setting. For that I need a dataframe that
is complex enough to exhibit the behaviour I want, but still so small
that the model.matrix is easily understood. So I make such a
dataframe.
I am trying to distribute this dataframe along with my code, in a way
that is as simple as possible to USE for the students (hence the
one-file boundary condition) and to READ (hence the non-transposition
boundary condition).
Does anybody have any ideas?
Ernst Hansen
Department of Statistics
University of Copenhagen
More information about the R-help
mailing list