[R] How to modify XML documents and save changes

Duncan Temple Lang duncan at research.bell-labs.com
Tue Mar 4 15:22:02 CET 2003


Again, Stephen is right on the mark with his explanation of modifying
the XML objects and needing to put the modified values back into the
containing structure. Trees & graphs are slightly cumbersome in the S
language since it is not a reference-based language.  There are some
tricks one can use that are in the XML package for indexing them with
a "cursor", but these are (currently) only for use when constructing a
tree from scratch within S.

> A little cumbersome, but doable. One other option is to use xmlEventParse
> and write a handler that would add the element after the one you're
> interested in.  I hope there is a better way, but haven't seen it yet. :-(

Here there is some good news. xmlEventParse is extremely low-level
relative to xmlTreeParse and typically used for efficiency in
minimizing the amount of storage used when parsing the contents of an
XML document. The idea is that an S function gets invoked when an XML
tag is opened, and another when it is closed.

xmlTreeParse() can also be used in this event-driven style
programming.  When the XML parser encounters the close of a tag, it
looks for a suitable S function to invoke specified in the handlers
argument of xmlTreeParse.  If there is an element in that list of
functions whose name matches the tag name, it is invoked.  Thus if you
programmatically want to augment/modify the contents of an XML node,
you can supply a function that operates on that node.  The function
can return the resulting updated node and it will be inserted into the
overall tree as one would expect.

So, suppose one wants to _always_ add a node 
  <Norm></Norm>
to each <tagname></tagname> node, you can use the following

myFun <- function(node,...) {
  node <- append.XMLNode(node, xmlNode("Norm"))
  node
}

and then use this in the command

xmlTreeParse("file.xml", handlers = list(tagname = myFun), asTree = TRUE)

One can also use XSL to do this, either directly from the command
line or from S via the Sxslt package.

If one only wants to modify particular <tagname> elements, it is
probably easiest to read the entire tree and modify it directly.

For people familiar with other XML parsers, the default xmlTreeParse
behaves like DOM and xmlEventParse behaves likes SAX.  With handlers,
xmlTreeParse() provides a hybrid parser for XML somewhere between DOM
and SAX.

 D.


Stephen C. Upton wrote:
> Steffen,
> 
> As with most R objects, you're basically putting a copy of the R object into
> the new object. Any operation or function you apply to that object does not
> affect the original. Same goes for append.xmlNode - you're appending to the
> original and getting back another structure that is the original plus the
> new node. Finally, saveXML works on an object of XMLInternalDocument and doc
> (the object returned by xmlTreeParse) is a XMLDocument object.
> 
> Here's one suggestion:
> 1. Read in the doc object as you've done, but manipulate the structure as
> you would any other R object, e.g.,
> QTListNode [[1]] <-
> append.xmlNode(QTListNode[[1]],xmlNode(name="Norm",attrs=NULL))
> 2. you then need to modify that node within the root with this new, modified
> node, e.g. (assume I've assigned root <- xmlRoot(doc)),
> root[[whateverindextagnameis]] <- QTListNode[[1]]
> (I use the index here rather than the name, since it's unique - if you use a
> name,e.g., root[["tagname"]], that just adds a list element to root with
> name "tagname")
> 3. to write out, use write with toString.XMLNode,
> write(toString.XMLNode(root, file="out.xml")
> 
> Since this is probably a little cumbersome, suggest writing a function to do
> the finding, appending, and replacement.
> 
> A little cumbersome, but doable. One other option is to use xmlEventParse
> and write a handler that would add the element after the one you're
> interested in.  I hope there is a better way, but haven't seen it yet. :-(
> 
> HTH
> steve
> 
> 
> Steffen Durinck wrote:
> 
> > Dear,
> >
> > I want to read XML documents, add child nodes to some elements and store
> > everything back as an XML document.
> >
> > I've tryed the following:
> >
> > doc <- xmlTreeParse("file.xml")
> > QTListNode<-xmlElementsByTagName(xmlRoot(doc)[[1]],"tagname")
> > append.xmlNode(QTListNode[[1]],newXMLNode(name ="Norm", attrs = NULL))
> > saveXML(doc, file = "out.xml", compression = 0, indent=T)
> >
> > This doesn't seem to work.
> > Can anyone help?
> >
> > Thanks,
> > Steffen
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
_______________________________________________________________

Duncan Temple Lang                duncan at research.bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-3217
700 Mountain Avenue, Room 2C-259  fax:    (908)582-3340
Murray Hill, NJ  07974-2070       
         http://cm.bell-labs.com/stat/duncan




More information about the R-help mailing list