Alexander Heidrich alexander.heidrich at uni-jena.de
Sat Sep 1 21:34:00 CEST 2007

Dear all,

for my diploma thesis I have to import huge XML-Files into R for  
statistical processing - huge means a size about 33 MB.

I'm using the XML-Package version 1.9

As far as reading the complete file into R via xmlTreeParse doesn't  
work or is too slow, I'm trying to use xmlEventParse but I got  
completely stuck.

I have many different type of nodes

+ <configuration>

- <Data>
  - <dataSets noOfDataSets="50000">
   - <dataSet number="1">
    - <measurements>
     - <measurement number="1">
      - <MRType1>
         <collarCode />
         <depth />
     - <measurement number="2">
      - <MRType1>
         <collarCode />
         <depth />
+ <personData>
+ <siteData>

I only need the measurement/MRType1 nodes - how can I do this?  
Currently I am trying the following code:

xmlEventParse("/input.xml", list(startElement=xtract.startElement,  
text=xtract.text), useTagName=TRUE, addContext = FALSE)

xtract.startElement <- function(name,attr){
	startElement.name <<- c(startElement.name,name)

xtract.text <- function(text) {
	startElement.value <<- c(startElement.value,text)

this only gives me two lists, one with the all node names (even the  
ones I dont need) and one with the values (also together with the  
ones I dont need) but I can't put things together this way.

What I want is:

No. 	Date	Time	Plotcode	collarcode	value	depth
1	...	...	...		...		...	...
2	...	...	...		...		...	...

Any help is really really appreciated. I tried the whole week,  
starting with xmlTreeParse whick works fine for files with 200  
entries but for files with 50000 entries it keeps crashing my core 2  
duo, 2.4 GHz machine.

Thanks so much in advance! If you need any further information, code  
snippets or XML file details please do not hestitate to mail!


