[Bioc-devel] proposal for additional seqlevelsStyle

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Wed Dec 11 22:06:46 CET 2019

Hi Vince, Robert,

Looks like Vince wants the RefSeq accession e.g. NC_000017.11 for chrom 
17 in the GRCh38.

@Robert: Is this what you're also interested in?

The problem is that the RefSeq accessions are specific to a particular 
assembly (e.g. NC_000017.11 for chrom 17 in GRCh38 but NC_000017.10 for 
the same chrom in GRCh37).

Currently seqlevelsStyle() doesn't know how to distinguish between 
different assemblies of the same organism. Not saying it couldn't but it 
would require some thinking and some significant refactoring. It 
wouldn't be just a matter of adding a column to 


On 12/10/19 14:19, Robert Castelo wrote:
> I second this, and would suggest to name the style as 'GRC' for "Genome 
> Reference Consortium".
> thanks Vince for bringing this up, being able to easily switch between 
> genome styles is great.
> if 'paste0()' in R is one of the most influential contributions to 
> statistical computing
> https://urldefense.proofpoint.com/v2/url?u=https-3A__simplystatistics.org_2013_01_31_paste0-2Dis-2Dstatistical-2Dcomputings-2Dmost-2Dinfluential-2Dcontribution-2Dof-2Dthe-2D21st-2Dcentury&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=b0_SIu8orJ7ZcCS3TIodFvGTPibt9R8vFL5Y40YSx3Q&e= 
> i think that 'seqlevelsStyle()' from the GenomeInfoDb package is one of 
> the most influential contributions to human genetics, if you think about 
> the time invested by researchers in parsing and changing between 
> different styles of chromosome names :)
> robert.
> On 06/12/2019 15:03, Vincent Carey wrote:
>> I raised this issue previously with little response.
>> I'd propose that we add a column or two to genomeStyles()$Homo_sapiens
>>> head(genomeStyles()$Homo_sapiens, 2)
>>    circular auto   sex NCBI UCSC dbSNP Ensembl
>> 1    FALSE TRUE FALSE    1 chr1   ch1       1
>> 2    FALSE TRUE FALSE    2 chr2   ch2       2
>> that includes the values for "NCBI reference sequence names"
>> See 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_nuccore_568815581&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=3Jy-MH7heIcrc_A4qm_izduLvBoPWHSeq4gdxf5nv24&e=  
>> for one report on chr17,
>> and
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.39&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=y6ut_Xcc4rSbXanckiJhiwLsL0W8neJfKWQa6wnG3aM&e= 
>> for a table that includes the Genbank labels.
>> Should I just file a PR at 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomeInfoDb_&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=KMzfo3_8kkJ-wdvRCNP5rUjTVMW87brj07yHaKL5Qb0&e=  
>> after
>> testing?
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=SvtNreKVOHnSGjsRwzWWpttpEF7wBXI5utI37-qgX1A&e= 

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list