[R] Programcode and data in the same textfile
Duncan Temple Lang
duncan at research.bell-labs.com
Thu Jun 12 20:05:48 CEST 2003
Hi Ernst.
I have found myself in a similar situation where I want to send
code to someone with annotations that explain the different pieces
in richer ways than comments will permit.
If you want to contain both data and code within a single document,
you will need to have some way to identify which is which so that the
software can distinguish the different elements of the document. This
is precisely what a markup language does. And rather than inventing ad
hoc conventions, why not simply use a real markup language. XML is the most
natural one, and doing something like
<doc>
<data>
Sex Response
Male 1
Male 2
Female 3
Female 4
</data>
<code>
......
</code>
</doc>
Using the XML package, you can read the document into R
and do what you will with it.
To read the data,
tr = xmlRoot(xmlTreeParse("myFile"))
read.table(textConnection(xmlValue(tr[["data"]])), header=TRUE)
and to access the code text
xmlValue(tr[["code"]])
I have a variety of different variants of this style of thing that I
occassionally add to the SXMLDocs package. But, for me at least, it is
easy to write handlers to process the different content but to leave
XML to identify them within the document.
Hope this provides some ideas for thinking about the problem
in a slightly broader light.
D.
Ernst Hansen wrote:
> I have the following problem. It is not of earthshaking importance,
> but still I have spent a considerable amount of time thinking about
> it.
>
> PROBLEM: Is there any way I can have a single textfile that contains
> both
>
> a) data
>
> b) programcode
>
> The program should act on the data, if the textfile is source()'ed
> into R.
>
>
> BOUNDARY CONDITION: I want the data written in the textfile in exactly
> the same format as I would use, if I had data in a separate textfile,
> to be read by read.table(). That is, with 'horizontal inhomogeneity'
> and 'vertical homogeneity' in the type of entries. I want to write
> something like
>
> Sex Respons
> Male 1
> Male 2
> Female 3
> Female 4
>
> In effect, I am asking if there is some way I can convince
> read.table(), that the data is contained in the following n lines of
> text.
>
>
> ILLEGAL SOLUTIONS:
> I know I can simulate the behaviour by reading the columns of the
> dataframe one by one, and using data.frame() to glue them together.
> Like in
>
> data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
> Respons = c(1, 2, 3, 4))
>
> I do not like this solution, because it represents the data in a
> "transposed" way in the textfile, and this transposition makes the
> structure of the dataframe less transparent - at least to me. It
> becomes even less comprehensible if the Sex-factor above is written
> with the help of rep() or gl() or the like.
>
> I know I can make read.table() read from stdin, so I could type the
> dataframe at the prompt. That is against the spirit of the problem,
> as I describe below.
>
>
> I know I can make read.table() do the job, if I split the data and the
> programcode in to different files. But as the purpose of the exercise
> is to distribute the data and the code to other people, splitting
> into several files is a complication.
>
>
> MOTIVATION: I frequently find myself distributing small chunks of code
> to my students, along with data on which the code can work.
>
> As an example, I might want to demonstrate how model.matrix() treats
> interactions, in a certain setting. For that I need a dataframe that
> is complex enough to exhibit the behaviour I want, but still so small
> that the model.matrix is easily understood. So I make such a
> dataframe.
>
> I am trying to distribute this dataframe along with my code, in a way
> that is as simple as possible to USE for the students (hence the
> one-file boundary condition) and to READ (hence the non-transposition
> boundary condition).
>
>
>
> Does anybody have any ideas?
>
>
> Ernst Hansen
> Department of Statistics
> University of Copenhagen
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
_______________________________________________________________
Duncan Temple Lang duncan at research.bell-labs.com
Bell Labs, Lucent Technologies office: (908)582-3217
700 Mountain Avenue, Room 2C-259 fax: (908)582-3340
Murray Hill, NJ 07974-2070
http://cm.bell-labs.com/stat/duncan
More information about the R-help
mailing list