[R] SAX Parser best practise
Duncan Temple Lang
duncan at wald.ucdavis.edu
Mon Sep 26 16:13:28 CEST 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
When you uncomment the two lines, your document
becomes two nodes
<spectrum>
...
<spectrum>
<spectrum>
...
</spectrum>
XML requires that there be a single top-level node.
And so the parser throws an error saying
Extra content at the end of the document
And it is the second <spectrum> .. </spectrum>
node that it is complaining about.
You can wrap the entire thing in a top node, e.g.
<spectra> <spectrum>...</spectrum><spectrum>...</spectrum></spectra>
How did I find this? I looked at the error message from
libxml. Now that we have exceptions in R and we are using
libxml2, etc. I can make this material available at the
R level. So I'll do that.
Jan Hummel wrote:
> Hi Duncan,
>
>
>>BTW, there is a new version of the XML package on the
>>Omegahat web site.
>
> I'll use it extensive in this days and unfortunately I have already a
> question/problem pending:
>
> Taking the following R function:
>
> test<-function(){
> sep=""
> xmlText <-""
> xmlText <-paste(xmlText,"<spectrum id=\"3257\">",sep=sep)
> xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<data>Monday</data>",sep=sep)
> xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<data>Tuesday</data>",sep=sep)
> xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
> # xmlText <-paste(xmlText,"</spectrum>",sep=sep)
> # xmlText <-paste(xmlText,"<spectrum id=\"3259\">",sep=sep)
> xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<data>Wednesday</data>",sep=sep)
> xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"<data>Thursday</data>",sep=sep)
> xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
> xmlText <-paste(xmlText,"</spectrum>",sep=sep)
>
> xmlEventParse(xmlText, asText=TRUE, handlers = list(text =
> function(x, ...) {cat(nchar(x),x, "\n")}))
> return(invisible(NULL))
> }
>
> Using this function in the given form works fine. xmlEventParse() with
> the simplest handler I can imagine finds all 4 text-nodes within the
> <spectrum> tag and prints them out. But if one uncomment both lines in
> the middle, introducing 2 <spectrum> tags with different id's
> xmlEventParse() returns with an exception. Of course the weekdays within
> <data> are arbitrary values used here. Further, using an other input
> file I could see, that for one and the same <data> node the handler for
> "text"-nodes was invoked two times, one time for a first part of the
> content and one time for the rest of the content. Both invocations
> together gave me exactly the content from the <data> node.
>
> So, am I on the wrong way? Or is this some buggy behaviour?
>
> I appreciat any help and assistance!
>
> Jan
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
- --
Duncan Temple Lang duncan at wald.ucdavis.edu
Department of Statistics work: (530) 752-4782
371 Kerr Hall fax: (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD4DBQFDOAII9p/Jzwa2QP4RAg+9AKCCkYAwTjlMQ9R9dsLbeWQxuf63uQCYkR3g
nEZl4wFXtkYSmsQ8/JyMDA==
=wXfS
-----END PGP SIGNATURE-----
More information about the R-help
mailing list