[BioC] Fw: Warning of function "ncbiTaxonomy"
Chris Stubben
stubben at lanl.gov
Mon Mar 5 17:21:48 CET 2012
I sent this on Friday but I'm not sure what happened to it. I
apologize if this re-posts.
> I have a list of NCBI taxon ids for which I would like to have both
the full lineage and common name information. So I Install the package
called >'genomes' (genomes_2.0.0.zip),then use function 'ncbiTaxonomy'
as followed,
> ncbiTaxonomy (1000587, "lineage")
>Premature end of data in tag TaxaSet line 1
The new NCBI E-Utilities updates (version 2.0 of ESummary and EFetch)
have broken a number of functions in my genomes package including
ncbiTaxonomy, so I decided to simplify and re-write all the NCBI
e-utility code and separate these from the parsers. You can find a
complete description on GitHub at https://github.com/cstubben/ncbi and
I will update the genomes dev package in a few weeks once I get
everything worked out. It should work something like this after the
next update...
Run einfo to see a list of search columns
einfo("taxonomy")
Name FullName
1 ALL All Fields
2 ALLN All Names
3 COMN Common Name
4 EDAT Entrez Date
5 FILT Filter
6 GC GC
7 LNGE Lineage
8 MGC MGC
9 NXLV Next Level
...
and then run esearch (using the lineage field) with esummary to get all
taxa in the lineage (I think this usually sorts phylogenetically).
esummary( esearch("Huitzilac virus[LNGE]", "taxonomy"))
Id Rank Division ScientificName
1 1000587 species viruses Huitzilac virus
2 339351 viruses unclassified Hantavirus
3 11598 genus viruses Hantavirus
4 11571 family viruses Bunyaviridae
5 35301 viruses ssRNA negative-strand viruses
6 439488 viruses ssRNA viruses
7 10239 superkingdom viruses Viruses
8 1 root
In addition, any search result or list of IDs can also be passed
directly to esummary, efetch or elink, and I was using the xml results
from EFetch to parse the Lineage tag.
efetch("1000587,86782", db="taxonomy", retmode="xml")
<Lineage>Viruses; ssRNA viruses; ssRNA negative-strand viruses;
Bunyaviridae; Hantavirus; unclassified Hantavirus</Lineage>"
So long story, the updated ncbiTaxonomy(1000587, "lineage") will be
able to get these results again in a couple weeks. Sorry about the delay.
Chris Stubben
--
Los Alamos National Lab
BioScience Division
More information about the Bioconductor
mailing list