[R] SAX Parser best practise
Jan Hummel
Hummel at mpimp-golm.mpg.de
Mon Sep 26 10:13:53 CEST 2005
Hi Duncan,
thanks again for your comments.
> I dug around in the libxml code and the Web to verify that
> validation is indeed only possible in libxml when one uses
> DOM (i.e. xmlTreeParse()).
Using DOM is not an option for me, so I need to "validate" the xml parts
I'm interested in within my creation mechanism. It's OK, but not the
best solution in questions of design.
> BTW, there is a new version of the XML package on the
> Omegahat web site.
I'll use it extensive in this days and unfortunately I have already a
question/problem pending:
Taking the following R function:
test<-function(){
sep=""
xmlText <-""
xmlText <-paste(xmlText,"<spectrum id=\"3257\">",sep=sep)
xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<data>Monday</data>",sep=sep)
xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<data>Tuesday</data>",sep=sep)
xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
# xmlText <-paste(xmlText,"</spectrum>",sep=sep)
# xmlText <-paste(xmlText,"<spectrum id=\"3259\">",sep=sep)
xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<data>Wednesday</data>",sep=sep)
xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"<data>Thursday</data>",sep=sep)
xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
xmlText <-paste(xmlText,"</spectrum>",sep=sep)
xmlEventParse(xmlText, asText=TRUE, handlers = list(text =
function(x, ...) {cat(nchar(x),x, "\n")}))
return(invisible(NULL))
}
Using this function in the given form works fine. xmlEventParse() with
the simplest handler I can imagine finds all 4 text-nodes within the
<spectrum> tag and prints them out. But if one uncomment both lines in
the middle, introducing 2 <spectrum> tags with different id's
xmlEventParse() returns with an exception. Of course the weekdays within
<data> are arbitrary values used here. Further, using an other input
file I could see, that for one and the same <data> node the handler for
"text"-nodes was invoked two times, one time for a first part of the
content and one time for the rest of the content. Both invocations
together gave me exactly the content from the <data> node.
So, am I on the wrong way? Or is this some buggy behaviour?
I appreciat any help and assistance!
Jan
More information about the R-help
mailing list