[BioC] Trouble querying pubmed on strings
Seth Falcon
sfalcon at fhcrc.org
Sun Nov 6 20:03:47 CET 2005
Hi Ken,
On 4 Nov 2005, jerk_alert at hotmail.com wrote:
> hi all,
>
> i'm trying to get a function working that queries pubmed with any
> string and returns pubMedAbst objects corrresponding to the pubmed
> article hits from the query string...
>
> this is my code so far, based partly from annotate's 'query.pdf' and
> also from the perl script from NCBI at
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
> pmSrch <- function(query)
> {
> utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils"
>
> esearch <- paste(utils, "/esearch.fcgi?" ,
> "report=xml&mode=text&tool=bioconductor&",
> "db=Pubmed&retmax=1&usehistory=y&term=", query)
> esearch <- gsub(" ", "", esearch)
You might find the sep and collapse arguments to paste useful here.
No need for gsub then. That would also allow you to make the query
string a bit easier to read.
> i don't know perl and i end up with numAbst = 8 (regardless of the
> search string) and esearch =
If you look at what you get back:
lapply(xmlChildren(xmlRoot(pms)), xmlValue)
And look at the last part of the Perl example [1], you will see that
the search results have to be fetched in two steps. Here is a very
rough cut of a function to fetch results after the first query:
pmExtract <- function(pmSrchResult)
{
dom <- xmlRoot(pmSrchResult)
searchData <- lapply(xmlChildren(dom), xmlValue)
webEnv <- searchData$WebEnv
queryKey <- searchData$QueryKey
utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
args <- c("rettype=abstract",
"retmode=xml",
"retstart=0",
"retmax=3",
"db=pubmed",
paste("query_key", queryKey, sep="="),
paste("WebEnv", webEnv, sep="="))
args <- paste(args, collapse="&")
utils <- paste(utils, args, sep="")
cat(utils, "\n")
return(.handleXML(utils))
}
So then you would do:
res1 <- pmSearch("trk")
res2 <- pmExtract(res1)
## process res2 to extract the XML abstracts, etc
Hope that helps to get you going.
Best,
+ seth
[1] http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl
More information about the Bioconductor
mailing list