[BioC] Fetching documents from PubMed
Kaustubh Patil
kaustubhp_in at yahoo.com
Wed Feb 22 20:44:40 CET 2006
Hi,
I forgot to attch the file
Its here,
Kaustubh
Kaustubh Patil <kaustubhp_in at yahoo.com> wrote: Dear Robert,
Thanks for your reply. First of all something about my system,
I have celeron 2.5 with 512 mb ram, running fedora core 4
R Version 2.2.1 (2005-12-20 r36812) wilth RSXML 0.99
I am attaching a file that contains 2665 PMIDS that I want to fetch, load this file using
load("ids")
and it will create a variable with name ids.
Then if I use following code, I get only 363 abstracts,
docs <- pubmed(ids)
root <- xmlRoot(docs)
arts <- xmlApply(root,buildPubMedAbst)
absts <- sapply(arts,abstText)
length(absts)
[1] 363
interestingly those are first 363 abstracts. The 364th ("12136003") abstract could be fetched manually as well as using MedlineR library.
Am I missing something here?
Robert Gentleman <rgentlem at fhcrc.org> wrote: Hi,
pubmed makes precisely one request, so there is no issue with timing.
In many cases you can make a single request for lots of things, rather
than lots of requests for one thing. If you stick it in a for loop then
there could be problems, but so far not a single person has reported
hitting this particular wall.
As for why only 377 came back, did you check to see what happens if you
request one of the missing ones by itself? Or go to the website at NLM
and see if you Pubmed id is valid?
Also, please do read the posting guide and tell us something about your
system.
thanks
Robert
Kaustubh Patil wrote:
> Hi,
>
> I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries?
>
> Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated.
>
> Thank you and regards,
> Kaustubh
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
---------------------------------
---------------------------------
More information about the Bioconductor
mailing list