[R] xmlOutputBuffer vs xmlOutputDOM
Arjun Ravi Narayan
arjunravinarayan at gmail.com
Sun Jul 8 18:08:47 CEST 2007
Hi,
I am trying to use the XML package to write some data (pretty large
amounts of data) into XML files. I experimented with a few variations,
using xmlOutputBuffer and xmlOutputDOM.
xmlOutputDOM provides neat formatted, indented output, but takes very
long. xmlOutputBuffer is incompatible (in my experiences) with the
saveXML function, and so i hacked around it by outputting its $value()
to cat. This unfortunately makes it lose all proper formatting, and so
gives me an XML file with new lines after every tag or entry, and with
no indenting at all.
However, xmlOutputDOM takes very long - I am outputting rather large
files, and where xmlOutputBuffer takes about 10-15 seconds,
xmlOutputDOM takes about 20 minutes.
Am I using xmlOutputDOM in some wrong way? Is there a way to get
proper formatting out of xmlOutputBuffer? Either of these solutions
would be useful, as I see no advantage to using one over the other for
just outputting lots of data (>10000 fields at minimum)
Below is my code, and after that, an output of the times that were
reported on a sample run:
library(XML)
buffer <- xmlOutputBuffer()
buffer2 <- xmlOutputDOM()
buffer$addTag("outside", close = FALSE)
buffer2$addTag("outside", close = FALSE)
for(i in 1:1000) {
buffer$addTag("tag", i)
buffer2$addTag("tag", i)
}
buffer$closeTag()
buffer2$closeTag()
system.time(cat(buffer$value(), file = "foo2.xml"))
system.time(saveXML(buffer2$value(), file = "foo.xml"))
Times reported : the xmlOutputDOM is more than 100x slower.
> system.time(cat(buffer$value(), file = "foo2.xml"))
user system elapsed
0.004 0.000 0.001
> system.time(saveXML(buffer2$value(), file = "foo.xml"))
user system elapsed
0.476 0.024 0.516
>
I am using R version 2.5.1, and XML package version 1.9-0
Yours sincerely,
Arjun Ravi Narayan
More information about the R-help
mailing list