[R] Programcode and data in the same textfile
Torsten Hothorn
Torsten.Hothorn at rzmail.uni-erlangen.de
Thu Jun 12 15:00:26 CEST 2003
> I have the following problem. It is not of earthshaking importance,
> but still I have spent a considerable amount of time thinking about
> it.
>
> PROBLEM: Is there any way I can have a single textfile that contains
> both
>
> a) data
>
> b) programcode
>
> The program should act on the data, if the textfile is source()'ed
> into R.
>
>
> BOUNDARY CONDITION: I want the data written in the textfile in exactly
> the same format as I would use, if I had data in a separate textfile,
> to be read by read.table(). That is, with 'horizontal inhomogeneity'
> and 'vertical homogeneity' in the type of entries. I want to write
> something like
>
> Sex Respons
> Male 1
> Male 2
> Female 3
> Female 4
>
something like
tmpfilename <- tempfile()
tmpfile <- file(tmpfilename, "w")
cat(
### here comes my data
"Sex Respons",
"Male 1",
"Male 2",
"Female 3",
"Female 4",
### end of data input
file = tmpfile, sep="\n")
close(tmpfile)
read.table(tmpfilename, header = TRUE)
best,
Torsten
> In effect, I am asking if there is some way I can convince
> read.table(), that the data is contained in the following n lines of
> text.
>
>
> ILLEGAL SOLUTIONS:
> I know I can simulate the behaviour by reading the columns of the
> dataframe one by one, and using data.frame() to glue them together.
> Like in
>
> data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
> Respons = c(1, 2, 3, 4))
>
> I do not like this solution, because it represents the data in a
> "transposed" way in the textfile, and this transposition makes the
> structure of the dataframe less transparent - at least to me. It
> becomes even less comprehensible if the Sex-factor above is written
> with the help of rep() or gl() or the like.
>
> I know I can make read.table() read from stdin, so I could type the
> dataframe at the prompt. That is against the spirit of the problem,
> as I describe below.
>
>
> I know I can make read.table() do the job, if I split the data and the
> programcode in to different files. But as the purpose of the exercise
> is to distribute the data and the code to other people, splitting
> into several files is a complication.
>
>
> MOTIVATION: I frequently find myself distributing small chunks of code
> to my students, along with data on which the code can work.
>
> As an example, I might want to demonstrate how model.matrix() treats
> interactions, in a certain setting. For that I need a dataframe that
> is complex enough to exhibit the behaviour I want, but still so small
> that the model.matrix is easily understood. So I make such a
> dataframe.
>
> I am trying to distribute this dataframe along with my code, in a way
> that is as simple as possible to USE for the students (hence the
> one-file boundary condition) and to READ (hence the non-transposition
> boundary condition).
>
>
>
> Does anybody have any ideas?
>
>
> Ernst Hansen
> Department of Statistics
> University of Copenhagen
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
More information about the R-help
mailing list