[Bioc-devel] NCBI taxonomy annotation

Levi Waldron |w@|dron@re@e@rch @end|ng |rom gm@||@com
Sun Aug 8 20:10:06 CEST 2021


Does anyone else do mapping between NCBI taxids, names, and ranks? We do
this in curatedMetagenomicData and soon other packages, currently using
external files that lack provenance and versioning, so Ludwig Geistlinger
was looking for Bioconductor annotation resources. The closest he found was
in GenomeInfoDbData <https://bioconductor.org/packages/GenomeInfoDbData> but
this has only genus and species, and some quirks like Bacteria being listed
as a genus:

> library(GenomeInfoDbData)
> data(specData)
> head(specData)
  tax_id        genus     species
1      1          all        <NA>
2      1         root        <NA>
3      2     Bacteria        <NA>
4      6 Azorhizobium        <NA>
5      7 Azorhizobium caulinodans
6      9     Buchnera  aphidicola
> dim(specData)
[1] 2521271       3
> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
[1] 562

Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
<maintainer at bioconductor.org>") about a pull request either to a) update
specData to add additional columns for all taxonomic levels, or b) creating
a new object? Or, another approach altogether? See
https://github.com/waldronlab/curatedMetagenomicData/issues/245.

--

Levi Waldron

Associate Professor

Department of Epidemiology and Biostatistics

CUNY Graduate School of Public Health and Health Policy

Institute for Implementation Science in Population Health

55 W 125th St, New York NY 10035

https://waldronlab.io

Join the microbiome Virtual International Forum: https://microbiome-vif.org

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list