[BioC] Trouble querying pubmed on strings

Seth Falcon sfalcon at fhcrc.org
Sun Nov 6 20:03:47 CET 2005


Hi Ken,

On  4 Nov 2005, jerk_alert at hotmail.com wrote:
> hi all,
>
> i'm trying to get a function working that queries pubmed with any
> string and returns pubMedAbst objects corrresponding to the pubmed
> article hits from the query string...
>
> this is my code so far, based partly from annotate's 'query.pdf' and
> also from the perl script from NCBI at
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

> pmSrch <- function(query)
> {
> utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils"
>
> esearch <- paste(utils, "/esearch.fcgi?" , 
> "report=xml&mode=text&tool=bioconductor&", 
> "db=Pubmed&retmax=1&usehistory=y&term=", query)
> esearch <- gsub(" ", "", esearch)

You might find the sep and collapse arguments to paste useful here.
No need for gsub then.  That would also allow you to make the query
string a bit easier to read.

> i don't know perl and i end up with numAbst = 8 (regardless of the
> search string) and esearch =

If you look at what you get back:

  lapply(xmlChildren(xmlRoot(pms)), xmlValue)

And look at the last part of the Perl example [1], you will see that
the search results have to be fetched in two steps.  Here is a very
rough cut of a function to fetch results after the first query:

pmExtract <- function(pmSrchResult)
{
    dom <- xmlRoot(pmSrchResult)
    searchData <- lapply(xmlChildren(dom), xmlValue)
    webEnv <- searchData$WebEnv
    queryKey <- searchData$QueryKey
    
    utils <- "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
    args <- c("rettype=abstract",
              "retmode=xml",
              "retstart=0",
              "retmax=3",
              "db=pubmed",
              paste("query_key", queryKey, sep="="),
              paste("WebEnv", webEnv, sep="="))
    args <- paste(args, collapse="&")
    utils <- paste(utils, args, sep="")
    cat(utils, "\n")
    return(.handleXML(utils))
}


So then you would do:

res1 <- pmSearch("trk")
res2 <- pmExtract(res1)

## process res2 to extract the XML abstracts, etc


Hope that helps to get you going.

Best,

+ seth


[1] http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl



More information about the Bioconductor mailing list