[Bioc-devel] NCBI taxonomy annotation

Brian Schilder br|@n_@ch||der @end|ng |rom @|umn|@brown@edu
Mon Aug 9 22:15:48 CEST 2021


Hi Levi, 

I recently just put together a new package called orthogene <https://github.com/neurogenomics/orthogene> (currently under review by bioc) that has a convenience function for flexibly mapping species identifiers to any ID types (including NCBI taxa IDs): map_species() 

It may not be as comprehensive as GenomeInfoDbData, but might still be useful. 

Best, 
Brian
___________
Brian Schilder
PhD Candidate
UK Dementia Research Institute at Imperial College London
Faculty of Medicine, Department of Brain Sciences, Neurogenomics Lab
Profile | bit.ly/imperial_profile <https://bit.ly/imperial_profile>
LinkedIn | linkedin.com/in/brian-schilder <https://www.linkedin.com/in/brian-schilder/>
Twitter | twitter.com/BMSchilder <http://www.twitter.com/BMSchilder>
Lab | neurogenomics.co.uk <http://neurogenomics.co.uk/>
UK DRI | www.ukdri.ac.uk <http://www.ukdri.ac.uk/>


> On 8 Aug 2021, at 19:10, Levi Waldron <lwaldron.research using gmail.com> wrote:
> 
> Does anyone else do mapping between NCBI taxids, names, and ranks? We do
> this in curatedMetagenomicData and soon other packages, currently using
> external files that lack provenance and versioning, so Ludwig Geistlinger
> was looking for Bioconductor annotation resources. The closest he found was
> in GenomeInfoDbData <https://bioconductor.org/packages/GenomeInfoDbData> but
> this has only genus and species, and some quirks like Bacteria being listed
> as a genus:
> 
>> library(GenomeInfoDbData)
>> data(specData)
>> head(specData)
>  tax_id        genus     species
> 1      1          all        <NA>
> 2      1         root        <NA>
> 3      2     Bacteria        <NA>
> 4      6 Azorhizobium        <NA>
> 5      7 Azorhizobium caulinodans
> 6      9     Buchnera  aphidicola
>> dim(specData)
> [1] 2521271       3
>> subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id
> [1] 562
> 
> Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer
> <maintainer at bioconductor.org>") about a pull request either to a) update
> specData to add additional columns for all taxonomic levels, or b) creating
> a new object? Or, another approach altogether? See
> https://github.com/waldronlab/curatedMetagenomicData/issues/245.
> 
> --
> 
> Levi Waldron
> 
> Associate Professor
> 
> Department of Epidemiology and Biostatistics
> 
> CUNY Graduate School of Public Health and Health Policy
> 
> Institute for Implementation Science in Population Health
> 
> 55 W 125th St, New York NY 10035
> 
> https://waldronlab.io
> 
> Join the microbiome Virtual International Forum: https://microbiome-vif.org
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list