[Bioc-devel] NCBI taxonomy annotation

Sun Aug 8 20:10:06 CEST 2021

Does anyone else do mapping between NCBI taxids, names, and ranks? We do
this in curatedMetagenomicData and soon other packages, currently using
external files that lack provenance and versioning, so Ludwig Geistlinger
was looking for Bioconductor annotation resources. The closest he found was
in GenomeInfoDbData <https://bioconductor.org/packages/GenomeInfoDbData> but
this has only genus and species, and some quirks like Bacteria being listed
as a genus:

> library(GenomeInfoDbData)
> data(specData)
> head(specData)
  tax_id        genus     species
1      1          all        <NA>
2      1         root        <NA>
3      2     Bacteria        <NA>
4      6 Azorhizobium        <NA>
5      7 Azorhizobium caulinodans
6      9     Buchnera  aphidicola
> dim(specData)
[1] 2521271       3
> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
[1] 562

Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
<maintainer at bioconductor.org>") about a pull request either to a) update
specData to add additional columns for all taxonomic levels, or b) creating
a new object? Or, another approach altogether? See


