[Bioc-devel] proposal for additional seqlevelsStyle

Robert Castelo robert@c@@te|o @end|ng |rom up|@edu
Fri Dec 13 17:01:14 CET 2019


hi Hervé,

i didn't know about this new sequence style until Vince posted his 
message and we briefly talked about it at the European BioC meeting this 
week in Brussels. however, i didn't know that the style was specific to 
a particular assembly. i have no use case of this at the mome moment, 
i.e., i have not encountered myself any annotation or BAM file with 
chromosome names written that way, so i don't know how pressing this 
issue is, maybe Vince can tell us how spread such chromosome naming 
style may become in the near future.

naively, i'd think that it would be matter of adding a 
reference-specific column, i.e., 'GRCh38.p13', 'GRCh37.p13', etc., but i 
can imagine that maybe the "reference style" concept might not be the 
appropriate placeholder to map all different chromosome names of all 
different individual human genomes uploaded to NCBI. maybe we should 
wait until we have a specific use case .. Vince?

robert.

On 12/11/19 10:06 PM, Pages, Herve wrote:
> Hi Vince, Robert,
> 
> Looks like Vince wants the RefSeq accession e.g. NC_000017.11 for chrom
> 17 in the GRCh38.
> 
> @Robert: Is this what you're also interested in?
> 
> The problem is that the RefSeq accessions are specific to a particular
> assembly (e.g. NC_000017.11 for chrom 17 in GRCh38 but NC_000017.10 for
> the same chrom in GRCh37).
> 
> Currently seqlevelsStyle() doesn't know how to distinguish between
> different assemblies of the same organism. Not saying it couldn't but it
> would require some thinking and some significant refactoring. It
> wouldn't be just a matter of adding a column to
> genomeStyles()$Homo_sapiens.
> 
> H.
> 
> 
> On 12/10/19 14:19, Robert Castelo wrote:
>> I second this, and would suggest to name the style as 'GRC' for "Genome
>> Reference Consortium".
>>
>> thanks Vince for bringing this up, being able to easily switch between
>> genome styles is great.
>>
>> if 'paste0()' in R is one of the most influential contributions to
>> statistical computing
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__simplystatistics.org_2013_01_31_paste0-2Dis-2Dstatistical-2Dcomputings-2Dmost-2Dinfluential-2Dcontribution-2Dof-2Dthe-2D21st-2Dcentury&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=b0_SIu8orJ7ZcCS3TIodFvGTPibt9R8vFL5Y40YSx3Q&e=
>>
>> i think that 'seqlevelsStyle()' from the GenomeInfoDb package is one of
>> the most influential contributions to human genetics, if you think about
>> the time invested by researchers in parsing and changing between
>> different styles of chromosome names :)
>>
>> robert.
>>
>> On 06/12/2019 15:03, Vincent Carey wrote:
>>> I raised this issue previously with little response.
>>>
>>> I'd propose that we add a column or two to genomeStyles()$Homo_sapiens
>>>
>>>> head(genomeStyles()$Homo_sapiens, 2)
>>>     circular auto   sex NCBI UCSC dbSNP Ensembl
>>>
>>> 1    FALSE TRUE FALSE    1 chr1   ch1       1
>>>
>>> 2    FALSE TRUE FALSE    2 chr2   ch2       2
>>>
>>>
>>> that includes the values for "NCBI reference sequence names"
>>>
>>> See
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_nuccore_568815581&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=3Jy-MH7heIcrc_A4qm_izduLvBoPWHSeq4gdxf5nv24&e=
>>> for one report on chr17,
>>> and
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.39&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=y6ut_Xcc4rSbXanckiJhiwLsL0W8neJfKWQa6wnG3aM&e=
>>>
>>> for a table that includes the Genbank labels.
>>>
>>> Should I just file a PR at
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomeInfoDb_&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=KMzfo3_8kkJ-wdvRCNP5rUjTVMW87brj07yHaKL5Qb0&e=
>>> after
>>> testing?
>>>
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=SvtNreKVOHnSGjsRwzWWpttpEF7wBXI5utI37-qgX1A&e=
>>
> 

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list