[R] Need help extracting info from XML file using XML package
Duncan Temple Lang
duncan at wald.ucdavis.edu
Mon Mar 2 14:14:05 CET 2009
Wacek Kusnierczyk wrote:
> Don MacQueen wrote:
>> I have an XML file that has within it the coordinates of some polygons
>> that I would like to extract and use in R. The polygons are nested
>> rather deeply. For example, I found by trial and error that I can
>> extract the coordinates of one of them using functions from the XML
>> package:
>>
>> doc <- xmlInternalTreeParse('doc.kml')
>> docroot <- xmlRoot(doc)
>> pgon <-
>
> try
>
> lapply(
> xpathSApply(doc, '//Polygon',
> xpathSApply, '//coordinates', function(node)
> strsplit(xmlValue(node), split=',|\\s+')),
> as.numeric)
Just for the record, I the xpath expression in the
second xpathSApply would need to be
".//coordinates"
to start searching from the previously matched Polygon node.
Otherwise, the search starts from the top of the document again.
However, it would seem that
xpathSApply(doc, "//Polygon//coordinates",
function(node) strsplit(.....))
would be more direct, i.e. fetch the coordinates nodes in single
XPath expression.
D.
>
> which should find all polygon nodes, extract the coordinates node for
> each polygon separately, split the coordinates string by comma and
> convert to a numeric vector, and then report a list of such vectors, one
> vector per polygon.
>
> i've tried it on some dummy data made up from your example below. the
> xpath patterns may need to be adjusted, depending on the actual
> structure of your xml file, as may the strsplit pattern.
>
> vQ
>
>
>
>
>
>
>> but this is hardly general!
>>
>> I'm hoping there is some relatively straightforward way to use
>> functions from the XML package to recursively descend the structure
>> and return the text strings representing the polygons into, say, a
>> list with as many elements as there are polygons. I've been looking at
>> several XML documentation files downloaded from
>> http://www.omegahat.org/RSXML/ , but since my understanding of XML is
>> weak at best, I'm having trouble. I can deal with converting the text
>> strings to an R object suitable for plotting etc.
>>
>>
>> Here's a look at the structure of this file
>>
>> graphics[5]% grep Polygon doc.kml
>> <Polygon id="15342">
>> </Polygon>
>> <Polygon id="1073">
>> </Polygon>
>> <Polygon id="16508">
>> </Polygon>
>> <Polygon id="18665">
>> </Polygon>
>> <Polygon id="32903">
>> </Polygon>
>> <Polygon id="5232">
>> </Polygon>
>>
>> And each of the <Polygon> </Polygon> pairs has <coordinates> as per
>> this example:
>>
>>
>> <Polygon id="15342">
>> <outerBoundaryIs>
>> <LinearRing id="11467">
>> <coordinates>
>> -23.679835352296,30.263840290388,5.000000000000001
>> -23.68138782285701,30.264740875186,5.000000000000001
>> [snip]
>> -23.679835352296,30.263840290388,5.000000000000001
>> -23.679835352296,30.263840290388,5.000000000000001 </coordinates>
>> </LinearRing>
>> </outerBoundaryIs>
>> </Polygon>
>>
>>
>> Thanks!
>> -Don
>>
>>
>> p.s.
>> There is a lot of other stuff in this file, i.e, some points, and
>> attributes of the points such as color, as well as a legend describing
>> what the polygons mean, but I can get by without all that stuff, at
>> least for now.
>>
>> Note also that readOGR() would in principle work, but the underlying
>> OGR libraries have some limitations that this file exceeds. Per info
>> at http://www.gdal.org/ogr/drv_kml.html.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list