[R] Importing huge XML-Files
Alexander Heidrich
alexander.heidrich at uni-jena.de
Sat Sep 1 21:34:00 CEST 2007
Dear all,
for my diploma thesis I have to import huge XML-Files into R for
statistical processing - huge means a size about 33 MB.
I'm using the XML-Package version 1.9
As far as reading the complete file into R via xmlTreeParse doesn't
work or is too slow, I'm trying to use xmlEventParse but I got
completely stuck.
I have many different type of nodes
+ <configuration>
- <Data>
- <dataSets noOfDataSets="50000">
- <dataSet number="1">
- <measurements>
- <measurement number="1">
- <MRType1>
<date>21.04.2005</date>
<time>10:00</time>
<plotCode>1</plotCode>
<collarCode />
<value>2,33</value>
<depth />
</MRType1>
</measurement>
- <measurement number="2">
- <MRType1>
<date>21.04.2005</date>
<time>10:00</time>
<plotCode>1</plotCode>
<collarCode />
<value>2,33</value>
<depth />
</Soilrespirationrate>
<MRType2>
...
+ <personData>
+ <siteData>
I only need the measurement/MRType1 nodes - how can I do this?
Currently I am trying the following code:
xmlEventParse("/input.xml", list(startElement=xtract.startElement,
text=xtract.text), useTagName=TRUE, addContext = FALSE)
xtract.startElement <- function(name,attr){
startElement.name <<- c(startElement.name,name)
}
xtract.text <- function(text) {
startElement.value <<- c(startElement.value,text)
}
this only gives me two lists, one with the all node names (even the
ones I dont need) and one with the values (also together with the
ones I dont need) but I can't put things together this way.
What I want is:
No. Date Time Plotcode collarcode value depth
1 ... ... ... ... ... ...
2 ... ... ... ... ... ...
Any help is really really appreciated. I tried the whole week,
starting with xmlTreeParse whick works fine for files with 200
entries but for files with 50000 entries it keeps crashing my core 2
duo, 2.4 GHz machine.
Thanks so much in advance! If you need any further information, code
snippets or XML file details please do not hestitate to mail!
Alex
More information about the R-help
mailing list