[Bioc-devel] NCBI taxonomy annotation
Levi Waldron
|w@|dron@re@e@rch @end|ng |rom gm@||@com
Sun Aug 8 20:10:06 CEST 2021
Does anyone else do mapping between NCBI taxids, names, and ranks? We do
this in curatedMetagenomicData and soon other packages, currently using
external files that lack provenance and versioning, so Ludwig Geistlinger
was looking for Bioconductor annotation resources. The closest he found was
in GenomeInfoDbData <https://bioconductor.org/packages/GenomeInfoDbData> but
this has only genus and species, and some quirks like Bacteria being listed
as a genus:
> library(GenomeInfoDbData)
> data(specData)
> head(specData)
tax_id genus species
1 1 all <NA>
2 1 root <NA>
3 2 Bacteria <NA>
4 6 Azorhizobium <NA>
5 7 Azorhizobium caulinodans
6 9 Buchnera aphidicola
> dim(specData)
[1] 2521271 3
> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
[1] 562
Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
<maintainer at bioconductor.org>") about a pull request either to a) update
specData to add additional columns for all taxonomic levels, or b) creating
a new object? Or, another approach altogether? See
https://github.com/waldronlab/curatedMetagenomicData/issues/245.
--
Levi Waldron
Associate Professor
Department of Epidemiology and Biostatistics
CUNY Graduate School of Public Health and Health Policy
Institute for Implementation Science in Population Health
55 W 125th St, New York NY 10035
https://waldronlab.io
Join the microbiome Virtual International Forum: https://microbiome-vif.org
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list