[R-SIG-Mac] Failure message in R on Mac with xmlTreeParse
Duncan Temple Lang
duncan at wald.ucdavis.edu
Wed Dec 19 22:23:37 CET 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The [au] portion seems to be causing the problem.
So escape the [ and ] by mapping them to %5B and %5D respectively
_before_ handing the URL string to xmlTreeParse(). (The error message
indicates that the internals have already performed the conversion, but
if you do it yourself, things should work as I can reproduce your error
message and can get the desired result by escaping the [ and ] first.)
There is more information about what needs to be escaped at
http://publib.boulder.ibm.com/infocenter/discover/v8r4/index.jsp?topic=/com.ibm.discovery.ds.ref.doc/t_RG_Escape_Sequences.htm
The HTTP/FTP code built into the xmlTreeParse(), htmlTreeParse() and
xmlEventParse() functions (specifically from libxml2) is minimalistic.
For better or worse, it is the code that is also in R to implement
url() connections. It does not handle aspects of HTTP other than simple
request. So when I run into problems with xmlTreeParse() and a URL,
I first fetch the content of the document using the RCurl package.
And
library(RCurl)
getURL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au]")
does fetch the document and the result can be passed directly to
xmlTreeParse().
RCurl is an interface to libcurl which is a very solid, stable
and feature rich library for performing HTTP, HTTPS, FTP, ... client
queries which allows us to do, in R, pretty much anything a Web browser
can do but programmatically.
D.
Armin Goralczyk wrote:
> Hello
>
> In the following thread (R-help) the possibilities of analyzing
> publications from pubmed via XML were discussed:
>
> http://www.nabble.com/Analyzing-Publications-from-Pubmed-via-XML-to14328779.html#a14343090
>
> Using xmlTreeParse in a function results in a failure message on my
> Mac which is not reproduced in R for Windows:
>
>> esearch <- function (term){
> + srch.stem <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?"
> + srch.mode <- "db=pubmed&retmax=10000&retmode=xml&term="
> + doc <-xmlTreeParse(paste(srch.stem,srch.mode,term,sep=""),isURL = TRUE,
> + useInternalNodes = TRUE)
> + sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
> + }
>> term <- 'meyer'
>> pmid <- esearch(term) # works fine
>>
>> term <- 'meyer[au]'
>> pmid <- esearch(term)
> Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
> as.logical(ignoreBlanks), :
> error in creating parser for
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au]
> I/O warning : failed to load external entity
> "http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubmed&retmax=10000&retmode=xml&term=meyer%5Bau%5D"
>
> The problem seems to be the search tag [au].
> I am not very familiar with XML or the xmlTreeParse function, so I
> don't know what is wrong. Can anybody help?
>
> Thanks
>
> My version:
>> R.Version()
> $platform
> [1] "powerpc-apple-darwin8.10.1"
>
> $arch
> [1] "powerpc"
>
> $os
> [1] "darwin8.10.1"
>
> $system
> [1] "powerpc, darwin8.10.1"
>
> $status
> [1] "Patched"
>
> $major
> [1] "2"
>
> $minor
> [1] "6.0"
>
> $year
> [1] "2007"
>
> $month
> [1] "11"
>
> $day
> [1] "09"
>
> $`svn rev`
> [1] "43408"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 2.6.0 Patched (2007-11-09 r43408)"
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHaYvZ9p/Jzwa2QP4RAhwbAJoC+KK8tMGWnL5vQehBPWyUWqzDFwCbBxKP
iwWaeL7eDgUI1jg988fYD0A=
=WsL3
-----END PGP SIGNATURE-----
More information about the R-SIG-Mac
mailing list