[R] R and XML -- a near perfect combination?

Duncan Temple Lang duncan at rice.research.bell-labs.com
Wed Dec 1 15:55:41 CET 1999

The great thing about data exchange is that at least two systems have
to be involved in the exchange (for it to be non-trivial!)  John
Chambers, myself and others have been discussing XML _joint_
integration into several projects - Omegahat, R and S.  

The interesting thing from the general perspective (rather than
a particular project and pair of interacting applications)
is in defining some DTDs that people are comfortable using.
This applies not only to data frames but also model specification,
results, etc.

>From the implementation perspective of reading XML, Omegahat has it
automatically. It would be nice to have the one shared by both R and S
and this was on my list of things to do.  Irrespective of the choice
of C/C++ parsing system, one approach I was thinking of in R is to use
a closure that is associated with a DTD. Thinking out aloud, the idea
is to have functions in the closure that correspond to the different
elements in the DTD.  As the parser discovers each element instance,
it calls the associated function in the closure (or a default one)
with the attribute lists and potentially the "identifier" for the
parent node in the resulting tree (although R and S aren't exactly
designed for trees).

(If this doesn't make any sense, it could be attributed to too little


> Cc: r-help at stat.math.ethz.ch
> References: <Pine.LNX.4.10.9911301721080.3817-100000 at www.approximity.com>
> From: rossini at biostat.washington.edu (A.J. Rossini)
> Date: 30 Nov 1999 15:59:59 -0800
> Lines: 33
> Sender: owner-r-help at stat.math.ethz.ch
> Precedence: bulk
> >>>>> "c" == cys  <cys at www.approximity.com> writes:
>     c> Did anybody alreay write a XML parser for R?  XML, as we will
>     c> have tons of data-interchange with all sorts of other programs
>     c> and XML is good for giving meaning to raw data.
>     c> Any pointers/comments would be highly appreciated.
> It's a nice format, if you know what you are doing.  The main thought
> that I've been having for what you are proposing (data exchange of
> datasets) would be to write an converter from your XML format to a
> text representation of the corresponding data.frame.  
> Reasonably simple, plus you are free to use whatever your choice of
> parser language is (C++, Java, Python, whatever).  Plus, you can grow
> it (a simple list is easy, adding row/col names isn't too hard,
> etc...  Do it using pipes, and you will be fine for Unix and NT.
> The only problem with a generic parser is the necessity of doing XML
> to XML conversion, since you can't be sure that everyone wants to use
> the DTD (or style) that you particularly like.
> best,
> -tony
> -- 
> A.J. Rossini			Research Assistant Professor of Biostatistics 
> Center for AIDS Research/HMC	Biostatistics/Univ. of Washington
> Box 359931			Box 357232
> 206-731-3647 (3693=fax)		206-543-1044 (3286=fax)
> rossini at u.washington.edu	rossini at biostat.washington.edu
> http://www.biostat.washington.edu/~rossini/
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


Duncan Temple Lang                duncan at research.bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-3217
700 Mountain Avenue, Room 2C-259  fax:    (908)582-3340
Murray Hill, NJ  07974-2070       

      "Languages shape the way we think, and determine what 
       we can think about."        
                                      Benjamin Whorf
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list