[R] Problem with handling of attributes in xmlToList in XML package

santiago gil sg.ccnr at gmail.com
Sun Apr 14 20:09:12 CEST 2013


Hello all,

I have a problem with the way attributes are dealt with in the
function xmlToList(), and I haven't been able to figure it out for
days now.

Say I have a document (produced by nmap) like this:

> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status state="up" reason="echo-reply" reason_ttl="127"/>
    <address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/>
    <ports><port protocol="tcp" portid="135"><state state="open"
reason="syn-ack" reason_ttl="127"/><service name="msrpc"
product="Microsoft Windows RPC" ostype="Windows" method="probed"
conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port>
    <port protocol="tcp" portid="139"><state state="open"
reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn"
method="probed" conf="10"/></port>
    </ports>
    <times srtt="647" rttvar="71" to="100000"/>
    </host>'

I want to store this as a list of lists, so I do:

mytree<-xmlTreeParse(mydoc)
myroot<-xmlRoot(mytree)
mylist<-xmlToList(myroot)

Now my problem is that when I want to fetch the attributes of the
services running of each port, the behavior is not consistent:

> mylist[["ports"]][[1]][["service"]]$.attrs["name"]
   name
"msrpc"
> mylist[["ports"]][[2]][["service"]]$.attrs["name"]
Error in trash_list[["ports"]][[2]][["service"]]$.attrs :
  $ operator is invalid for atomic vectors

I understand that the way they are dfined in the documnt is not the
same, but I think there still should be a consistent behavior. I've
tried many combination of parameters for xmlTreeParse() but nothing
has helped me. I can't find a way to call up the name of the service
consistently regardless of whether the node has children or not. Any
tips?

All the best,


S.G.

--
-------------------------------------------------------------------------------
http://barabasilab.neu.edu/people/gil/



More information about the R-help mailing list