[R] Need help extracting info from XML file using XML package
Romain Francois
romain.francois at dbmail.com
Mon Mar 2 12:54:37 CET 2009
Hi,
You also might want to check R4X:
# install.packages("R4X", repos="http://R-Forge.R-project.org")
require( "R4X" )
x <- xml("http://code.google.com/apis/kml/documentation/KML_Samples.kml")
coords <- x["////Polygon///coordinates/#" ]
data <- sapply( strsplit( coords, "(,|\\s+)" ), as.numeric )
Romain
--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
Wacek Kusnierczyk wrote:
> Don MacQueen wrote:
>
>> I have an XML file that has within it the coordinates of some polygons
>> that I would like to extract and use in R. The polygons are nested
>> rather deeply. For example, I found by trial and error that I can
>> extract the coordinates of one of them using functions from the XML
>> package:
>>
>> doc <- xmlInternalTreeParse('doc.kml')
>> docroot <- xmlRoot(doc)
>> pgon <-
>>
>
> try
>
> lapply(
> xpathSApply(doc, '//Polygon',
> xpathSApply, '//coordinates', function(node)
> strsplit(xmlValue(node), split=',|\\s+')),
> as.numeric)
>
> which should find all polygon nodes, extract the coordinates node for
> each polygon separately, split the coordinates string by comma and
> convert to a numeric vector, and then report a list of such vectors, one
> vector per polygon.
>
> i've tried it on some dummy data made up from your example below. the
> xpath patterns may need to be adjusted, depending on the actual
> structure of your xml file, as may the strsplit pattern.
>
> vQ
>
>
>
>
>
>
>
>> but this is hardly general!
>>
>> I'm hoping there is some relatively straightforward way to use
>> functions from the XML package to recursively descend the structure
>> and return the text strings representing the polygons into, say, a
>> list with as many elements as there are polygons. I've been looking at
>> several XML documentation files downloaded from
>> http://www.omegahat.org/RSXML/ , but since my understanding of XML is
>> weak at best, I'm having trouble. I can deal with converting the text
>> strings to an R object suitable for plotting etc.
>>
>>
>> Here's a look at the structure of this file
>>
>> graphics[5]% grep Polygon doc.kml
>> <Polygon id="15342">
>> </Polygon>
>> <Polygon id="1073">
>> </Polygon>
>> <Polygon id="16508">
>> </Polygon>
>> <Polygon id="18665">
>> </Polygon>
>> <Polygon id="32903">
>> </Polygon>
>> <Polygon id="5232">
>> </Polygon>
>>
>> And each of the <Polygon> </Polygon> pairs has <coordinates> as per
>> this example:
>>
>>
>> <Polygon id="15342">
>> <outerBoundaryIs>
>> <LinearRing id="11467">
>> <coordinates>
>> -23.679835352296,30.263840290388,5.000000000000001
>> -23.68138782285701,30.264740875186,5.000000000000001
>> [snip]
>> -23.679835352296,30.263840290388,5.000000000000001
>> -23.679835352296,30.263840290388,5.000000000000001 </coordinates>
>> </LinearRing>
>> </outerBoundaryIs>
>> </Polygon>
>>
>>
>> Thanks!
>> -Don
>>
>>
>> p.s.
>> There is a lot of other stuff in this file, i.e, some points, and
>> attributes of the points such as color, as well as a legend describing
>> what the polygons mean, but I can get by without all that stuff, at
>> least for now.
>>
>> Note also that readOGR() would in principle work, but the underlying
>> OGR libraries have some limitations that this file exceeds. Per info
>> at http://www.gdal.org/ogr/drv_kml.html.
>>
More information about the R-help
mailing list