[R] another XML package question

Duncan Temple Lang dtemplelang at ucdavis.edu
Mon Sep 8 17:25:20 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Antje wrote:
> Hi Duncan,
> 
> thanks a lot for your explanations.
> 
> I tried the following now to understand a bit more:
> 
> data <- getNodeSet(doc, "//Data")
> xmlName(data[[1]])
> xmlName(xmlRoot(data[[1]]))
> xpathApply(data[[1]], "./*", xmlName)
> 
> Is it right that using "data" in the xpathApply() somehow sets the
> current node but does not change the root?

The answer is "it depends", specifically on what version of
the XML package you have.
In version 1.96-0 (the latest release), yes.
There is code also in the package (but overriden)
that creates a new temporary tree with the given node as the
root of the new tree (but without copying the nodes).
But the former is most likely what is desired.

> So looking for a subnode at all levels below my current node is not
> possible with the xPath syntax? 

It is possible

  getNodeSet( data[[1]], ".//*")

does that. The // means "any level". BTW, it doesn't match text
nodes, so you might want
          ".//*|.//text()|.//processing-instruction()"
for completeness (or maybe not!)

The key thing is that when you supply a node (and not the document)
as the first argument of getNodeSet() or xpathApply(), the XPath
query should be a relative query, e.g. .//* rather than //*.

And the reason for keeping the root the same is so that we can do

  getNodeSet(data[[1]], "ancestor::*")
or
  getNodeSet(data[[1]], "../foo")

i.e. have an XPath expression that refers to nodes "higher" up the tree.

 D.

> (search on all levels starting from root
> is possible with "//nodename")
> 
> Antje
> 
> 
> 
> 
> Duncan Temple Lang schrieb:
> 
> 
> Antje wrote:
>>>> Hi there,
>>>>
>>>> does anybody know how to return the xmlPath from a node?
>>>> For example, at several location in the xml file, I have nodes with the
>>>> same name and I'd like to process only the nodes from a certain path.
>>>>
>>>> Any idea?
> 
> As with your previous question, there are ways to do this
> with either XPath queries or R functions that operate on
> the nodes from the earlier queries.
> 
> By "xmlPath", let's assume you mean the ordered collection of
> nodes from the node to the root node of the document,
> i.e. the collection of ancestor nodes.
> So using XPath, you could use
> 
>    a = getNodeSet( node, "ancestor::*")
> 
> where node is the R variable containing the node within the tree
> whose ancestors you want, e.g.
>     getNodeSet(doc, "//val")[[1]]
> 
> The nodes in are in "reverse" order.
> 
> 
> You can do the same thing with the R function
> xmlParent().  To get the ancestors,
> 
>   tmp = xmlParent(node)
>   ans = list()
>   while( !is.null(tmp)) {
>       ans = c(ans, tmp)
>       tmp = xmlParent(tmp)
>   }
> 
> and of course in your case you could terminate the loop
> at any point.
> 
> 
> But a different approach to the problem is to use a more specific
> XPath query in the first place to get only the nodes of interest.
> For example, to get the <val> nodes in the second <data> node of
> your example, you could use
> 
>   getNodeSet(doc, "//data[2]/val")
> 
> or to find all <val> nodes which have the attribute  i = "t2",
> 
>    getNodeSet(doc, "//val[@i='t2']")
> 
> Or to find all <val> nodes with an ancestor which have an ancestor
> with an attribute name "loc"
> 
>      getNodeSet(doc, "//*[@loc='1']//val")
> 
> 
> 
> (
> The  sample XML document was
> 
> <root>
>    <data loc="1">
>      <val i="t1"> 22 </val>
>      <val i="t2"> 45 </val>
>    </data>
>    <data loc="2">
>      <val i="t1"> 44 </val>
>      <val i="t2"> 11 </val>
>    </data>
> </root>
> 
> )
> 
> 
>  D.
> 
>>>> Antje
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>

> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjFQLMACgkQ9p/Jzwa2QP5mSwCffr3WDFAAvEQ+PDhIl65R8uQb
EvUAn0bHeUqZSKQzUlDO4qaCV69tMuNg
=y6Eo
-----END PGP SIGNATURE-----



More information about the R-help mailing list