[R] Parsing XML File

jim holtman jholtman at gmail.com
Sun Oct 11 21:54:10 CEST 2015


Not sure exactly what you want since you did not show an expected output,
but this will extract the attributes from AccVal in the structure:

> #####################################################################
>  library(XML)
>
>  xmlfile=xmlParse("/temp/account.xml")
>
>  class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
[1] "XMLInternalDocument" "XMLAbstractDocument"
>  xmltop = xmlRoot(xmlfile) #gives content of root
>
>  #####  try this  ##############
>
>  accts <- sapply(getNodeSet(xmltop, "//AccVal"), xmlAttrs)
>
>  # create data.frame
>  accts_df <- as.data.frame(t(accts), stringsAsFactors = FALSE)
>  str(accts_df)
'data.frame':   364 obs. of  4 variables:
 $ key        : chr  "AccountCode" "AccountReady" "AccountType"
"AccruedCash" ...
 $ val        : chr  "DU108063" "true" "CORPORATION" "0" ...
 $ currency   : chr  "" "" "" "AUD" ...
 $ accountName: chr  "DU108063" "DU108063" "DU108063" "DU108063" ...
>  head(accts_df)
           key         val currency accountName
1  AccountCode    DU108063             DU108063
2 AccountReady        true             DU108063
3  AccountType CORPORATION             DU108063
4  AccruedCash           0      AUD    DU108063
5  AccruedCash           0     BASE    DU108063
6  AccruedCash           0      CAD    DU108063
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Oct 11, 2015 at 3:10 PM, Lorenzo Isella <lorenzo.isella at gmail.com>
wrote:

> Dear All,
> I am struggling with the parsing of the xml file you can find at
>
> https://www.dropbox.com/s/i4ld5qa26hwrhj7/account.xml?dl=0
>
> Essentially, I would like to be able to convert it to a data.frame to
> manipulate it in R and detect all the attributes of an account for
> which  unrealizedPNL goes above a threshold.
> I stored that file as account.xml and looking here and there on the
> web I put together the following script
>
>
> #####################################################################
> library(XML)
>
> xmlfile=xmlParse("account.xml")
>
> class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
> xmltop = xmlRoot(xmlfile) #gives content of root
> class(xmltop)#"XMLInternalElementNode" "XMLInternalNode"
> "XMLAbstractNode"
> xmlName(xmltop) #give name of node, PubmedArticleSet
> xmlSize(xmltop) #how many children in node, 19
> xmlName(xmltop[[1]]) #name of root's children
>
> # have a look at the content of the first child entry
> xmltop[[1]]
> # have a look at the content of the 2nd child entry
> xmltop[[2]]
> #Root Node's children
> number <- xmlSize(xmltop[[1]]) #number of nodes in each child
> name <- xmlSApply(xmltop[[1]], xmlName) #name(s)
> attribute <- xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
> size <- xmlSApply(xmltop[[1]], xmlSize) #size
>
>
> values <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
> #####################################################################
>
> which is leading me nowhere.
> Any suggestion is appreciated.
> Cheers
>
> Lorenzo
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list