[R] How to filter xml value in R?

Ben Tupper btupper at bigelow.org
Wed Nov 14 14:37:03 CET 2012


Hi,

On Nov 13, 2012, at 11:55 PM, Manish Gupta wrote:

> Hi,
> 
> I have one xml file. 
> 
> <Class>
>     <Node1 code ="1"> First node </Node1>
>     <Node2 code ="1"> Second node </Node2>
>     <Node3 code ="1"> Third node </Node3>
>    <Node1 code ="2"> Fourth node </Node1>
> </Class>
> 
> for (i in 1:xmlSize())
> {
>     print(Class[i])   # how can i filter Node1 ?
> }
> 
> by using xmlChildren(Class), i get nodes of Class. How can i filter Node1
> and print other elements of Class node?
> 

I think the XML functions "[" and "[[" are what you are looking for.  These operate like the xmlChildren function does.  You needn't loop through looking for the match - instead, just subscript by the node name. 

txt <- "<Class> <Node1 code =\"1\"> First node </Node1> <Node2 code =\"1\"> Second node </Node2> <Node3 code =\"1\"> Third node </Node3> <Node1 code =\"2\"> Fourth node </Node1> </Class>"

node0 <- xmlRoot( xmlTreeParse(txt, useInternalNodes = TRUE) )

node1 <- node0[["Node1"]]

From this point, you can use xmlValue or xmlAttrs to get at the value or attributes of the node.  (Or if node1 has children you simply drill down using "[[" and "[" as required.

If you have more than one child of type "Node1", as your example does, then the above would return just the first one.  To get them all you would use "[" instead of "[[".

node1.all <- node0["Node1"]


Cheers,
Ben


Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org




More information about the R-help mailing list