[R-SIG-Mac] Failure message in R on Mac with xmlTreeParse

Duncan Temple Lang duncan at wald.ucdavis.edu
Wed Dec 19 22:23:37 CET 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The [au] portion seems to be causing the problem.
So escape the [ and ] by mapping them to %5B and %5D respectively
_before_ handing the URL string to xmlTreeParse().  (The error message
indicates that the internals have already performed the conversion, but
if you do it yourself, things should work as I can reproduce your error
message and can get the desired result by escaping the [ and ] first.)

There is more information about what needs to be escaped at
http://publib.boulder.ibm.com/infocenter/discover/v8r4/index.jsp?topic=/com.ibm.discovery.ds.ref.doc/t_RG_Escape_Sequences.htm

The HTTP/FTP code built into the xmlTreeParse(), htmlTreeParse() and
xmlEventParse() functions (specifically from libxml2) is minimalistic.
For better or worse, it is the code that is also in R to implement
url() connections.  It does not handle aspects of HTTP other than simple
request.  So when I run into problems with xmlTreeParse() and a URL,
I first fetch the content of the document using the RCurl package.

And
library(RCurl)
getURL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au]")

does fetch the document and the result can be passed directly to
xmlTreeParse().

RCurl is an interface to libcurl which is a very solid, stable
and feature rich library for performing HTTP, HTTPS, FTP, ... client
queries which allows us to do, in R, pretty much anything a Web browser
can do but programmatically.

 D.

Armin Goralczyk wrote:
> Hello
> 
> In the following thread (R-help) the possibilities of analyzing
> publications from pubmed via XML were discussed:
> 
> http://www.nabble.com/Analyzing-Publications-from-Pubmed-via-XML-to14328779.html#a14343090
> 
> Using xmlTreeParse in a function results in a failure message on my
> Mac which is not reproduced in R for Windows:
> 
>> esearch <- function (term){
> + 	srch.stem <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?"
> + 	srch.mode <- "db=pubmed&retmax=10000&retmode=xml&term="
> + 	doc <-xmlTreeParse(paste(srch.stem,srch.mode,term,sep=""),isURL = TRUE,
> + 		useInternalNodes = TRUE)
> + 	sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue)
> + 	}
>> term <- 'meyer'
>> pmid <- esearch(term) # works fine
>>
>> term <- 'meyer[au]'
>> pmid <- esearch(term)
> Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
> as.logical(ignoreBlanks),  :
>   error in creating parser for
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au]
> I/O warning : failed to load external entity
> "http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubmed&retmax=10000&retmode=xml&term=meyer%5Bau%5D"
> 
> The problem seems to be the search tag [au].
> I am not very familiar with XML or the xmlTreeParse function, so I
> don't know what is wrong. Can anybody help?
> 
> Thanks
> 
> My version:
>> R.Version()
> $platform
> [1] "powerpc-apple-darwin8.10.1"
> 
> $arch
> [1] "powerpc"
> 
> $os
> [1] "darwin8.10.1"
> 
> $system
> [1] "powerpc, darwin8.10.1"
> 
> $status
> [1] "Patched"
> 
> $major
> [1] "2"
> 
> $minor
> [1] "6.0"
> 
> $year
> [1] "2007"
> 
> $month
> [1] "11"
> 
> $day
> [1] "09"
> 
> $`svn rev`
> [1] "43408"
> 
> $language
> [1] "R"
> 
> $version.string
> [1] "R version 2.6.0 Patched (2007-11-09 r43408)"
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaYvZ9p/Jzwa2QP4RAhwbAJoC+KK8tMGWnL5vQehBPWyUWqzDFwCbBxKP
iwWaeL7eDgUI1jg988fYD0A=
=WsL3
-----END PGP SIGNATURE-----



More information about the R-SIG-Mac mailing list