[R] Example for parsing XML file?

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 20 23:14:23 CEST 2009


Brigid Mooney wrote:
> Hi,
>
> I am trying to parse XML files and read them into R as a data frame,
> but have been unable to find examples which I could apply
> successfully.
>
> I'm afraid I don't know much about XML, which makes this all the more
> difficult.  If someone could point me in the right direction to a
> resource (preferably with an example or two), it would be greatly
> appreciated.
>
> Here is a snippet from one of the XML files that I am looking to read,
> and I am aiming to be able to get it into a data frame with columns N,
> T, A, B, C as in the 2nd level of the heirarchy.
>   

There might be a simpler approach, but this seems to do:

    library(XML)

    input = xmlParse(
'<?xml version="1.0" encoding="utf-8" ?>
  <C S="UnitA" D="1/3/2007" C="24745" F="24648">
  <T N="1" T="9:30:13 AM" A="30.05" B="29.85" C="30.05" />
  <T N="2" T="9:31:05 AM" A="29.89" B="29.78" C="30.05" />
  <T N="3" T="9:31:05 AM" A="29.9" B="29.86" C="29.87" />
  <T N="4" T="9:31:05 AM" A="29.86" B="29.86" C="29.87" />
  <T N="5" T="9:31:05 AM" A="29.89" B="29.86" C="29.87" />
  <T N="6" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" />
  <T N="7" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" />
  <T N="8" T="9:31:06 AM" A="29.89" B="29.85" C="29.86" />
</C>')

    (output = data.frame(t(xpathSApply(input, '//T', xpathSApply, '@*'))))
    #      N          T     A     B     C
    # 1 1 9:30:13 AM 30.05 29.85 30.05
    # 2 2 9:31:05 AM 29.89 29.78 30.05
    # 3 3 9:31:05 AM  29.9 29.86 29.87
    # 4 4 9:31:05 AM 29.86 29.86 29.87
    # 5 5 9:31:05 AM 29.89 29.86 29.87
    # 6 6 9:31:06 AM 29.89 29.85 29.86
    # 7 7 9:31:06 AM 29.89 29.85 29.86
    # 8 8 9:31:06 AM 29.89 29.85 29.86

    output$N
    # [1] 1 2 3 4 5 6 7 8
    # Levels: 1 2 3 4 5 6 7 8

you may need to reformat the columns.

vQ




More information about the R-help mailing list