[R] Analyzing Publications from Pubmed via XML
    Robert Gentleman 
    rgentlem at fhcrc.org
       
    Fri Dec 14 05:35:27 CET 2007
    
    
  
or just try looking in the annotate package from Bioconductor
Gabor Grothendieck wrote:
> On Dec 13, 2007 9:03 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
>> I would like to track in which journals articles about a particular disease
>> are being published. Creating a pubmed search is trivial. The search
>> provides data but obviously not as an R dataframe. I can get the search to
>> export the data as an xml feed and the xml package seems to be able to read
>> it.
>>
>> xmlTreeParse("
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
>> ",isURL=TRUE)
>>
>> But getting from there to a dataframe in which one column would be the name
>> of the journal and another column would be the year (to keep things simple)
>> seems to be beyond my capabilities.
>>
>> Has anyone ever done this and could you share your script? Are there any
>> published examples where the end result is a dataframe.
>>
>> I guess what I am looking for is an easy and simple way to parse the feed
>> and extract the data. Alternatively how does one turn an RSS feed into a CSV
>> file?
> 
> Try this:
> 
> library(XML)
> doc <-
> xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-",
> isURL = TRUE, useInternalNodes = TRUE)
> sapply(c("//author", "//category"), xpathApply, doc = doc, fun = xmlValue)
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
    
    
More information about the R-help
mailing list