[Bioc-devel] proposal for additional seqlevelsStyle
Pages, Herve
hp@ge@ @end|ng |rom |redhutch@org
Wed Dec 11 22:06:46 CET 2019
Hi Vince, Robert,
Looks like Vince wants the RefSeq accession e.g. NC_000017.11 for chrom
17 in the GRCh38.
@Robert: Is this what you're also interested in?
The problem is that the RefSeq accessions are specific to a particular
assembly (e.g. NC_000017.11 for chrom 17 in GRCh38 but NC_000017.10 for
the same chrom in GRCh37).
Currently seqlevelsStyle() doesn't know how to distinguish between
different assemblies of the same organism. Not saying it couldn't but it
would require some thinking and some significant refactoring. It
wouldn't be just a matter of adding a column to
genomeStyles()$Homo_sapiens.
H.
On 12/10/19 14:19, Robert Castelo wrote:
> I second this, and would suggest to name the style as 'GRC' for "Genome
> Reference Consortium".
>
> thanks Vince for bringing this up, being able to easily switch between
> genome styles is great.
>
> if 'paste0()' in R is one of the most influential contributions to
> statistical computing
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__simplystatistics.org_2013_01_31_paste0-2Dis-2Dstatistical-2Dcomputings-2Dmost-2Dinfluential-2Dcontribution-2Dof-2Dthe-2D21st-2Dcentury&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=b0_SIu8orJ7ZcCS3TIodFvGTPibt9R8vFL5Y40YSx3Q&e=
>
> i think that 'seqlevelsStyle()' from the GenomeInfoDb package is one of
> the most influential contributions to human genetics, if you think about
> the time invested by researchers in parsing and changing between
> different styles of chromosome names :)
>
> robert.
>
> On 06/12/2019 15:03, Vincent Carey wrote:
>> I raised this issue previously with little response.
>>
>> I'd propose that we add a column or two to genomeStyles()$Homo_sapiens
>>
>>> head(genomeStyles()$Homo_sapiens, 2)
>> circular auto sex NCBI UCSC dbSNP Ensembl
>>
>> 1 FALSE TRUE FALSE 1 chr1 ch1 1
>>
>> 2 FALSE TRUE FALSE 2 chr2 ch2 2
>>
>>
>> that includes the values for "NCBI reference sequence names"
>>
>> See
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_nuccore_568815581&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=3Jy-MH7heIcrc_A4qm_izduLvBoPWHSeq4gdxf5nv24&e=
>> for one report on chr17,
>> and
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.39&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=y6ut_Xcc4rSbXanckiJhiwLsL0W8neJfKWQa6wnG3aM&e=
>>
>> for a table that includes the Genbank labels.
>>
>> Should I just file a PR at
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomeInfoDb_&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=KMzfo3_8kkJ-wdvRCNP5rUjTVMW87brj07yHaKL5Qb0&e=
>> after
>> testing?
>>
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LCcYSINIz3XXhf8i-26IegXRLkTO1NgVbvzgvnPA3dc&s=SvtNreKVOHnSGjsRwzWWpttpEF7wBXI5utI37-qgX1A&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list