[Bioc-devel] BSgenome changes
Leonard Goldstein
go|d@te|n@|eon@rd @end|ng |rom gene@com
Tue Aug 18 20:17:50 CEST 2020
Thanks for the explanation Hervé.
Best wishes
Leonard
On Tue, Aug 18, 2020 at 10:06 AM Hervé Pagès <hpages using fredhutch.org> wrote:
> On 8/18/20 01:40, Kasper Daniel Hansen wrote:
> > In light of this, could we get a version of GRCh37 with only a single
> > mitochondrial genome?
>
> You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would
> contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without
> the hg19:chrM sequence?
>
> Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you
> stay away from this mess. I'm not sure that adding yet another BSgenome
> package would make the situation less confusing.
>
> >
> > On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès <hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>> wrote:
> >
> > Hi Felix,
> >
> > On 8/13/20 21:43, Felix Ernst wrote:
> > > Hi Leonard, Hi Herve,
> > >
> > > I followed your conversation, since I have noticed the same
> > problem. Thanks, Herve, for the explanation of the recent changes on
> > hg19.
> > >
> > > The GRCh37.P13 report states in its last line:
> > >
> > > MT assembled-molecule MT Mitochondrion J01415.2
> > = NC_012920.1 non-nuclear 16569 chrM
> > >
> > > Since the last name is called "UCSC-style-name", wouldn't that
> > mean that chrM has to be renamed to MT and not chrMT?
> >
> > This is a mistake in the sequence report for GRCh37.p13.
> GRCh37.p13:MT
> > is the same as hg19:chrMT, not hg19:chrM.
> >
> > hg19:chrM and hg19:chrMT are **not** the same sequences. The former
> is
> > NC_001807 and has length 16571 and the latter is NC_012920.1 and has
> > length 16569.
> >
> > Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
> >
> > Cheers,
> > H.
> >
> > >
> > > Thanks again for the explanation.
> > >
> > > Cheers,
> > > Felix
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Bioc-devel <bioc-devel-bounces using r-project.org
> > <mailto:bioc-devel-bounces using r-project.org>> Im Auftrag von Hervé
> Pagès
> > > Gesendet: Freitag, 14. August 2020 01:08
> > > An: Leonard Goldstein <goldstein.leonard using gene.com
> > <mailto:goldstein.leonard using gene.com>>; bioc-devel using r-project.org
> > <mailto:bioc-devel using r-project.org>
> > > Cc: charlotte.soneson using fmi.ch <mailto:charlotte.soneson using fmi.ch>
> > > Betreff: Re: [Bioc-devel] BSgenome changes
> > >
> > > Hi Leonard,
> > >
> > > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
> > >> Dear Bioc team,
> > >>
> > >> I'm following up on this recent GitHub issue
> > >>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
> > >>
> >
> _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
> > >. Please see the issue for more details and code examples.
> > >>
> > >> It looks like changes in Bioc devel result in two copies of the
> > >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
> > named
> > >> chrM like in previous package versions (length 16571) and one
> named
> > >> chrMT (length 16569).
> > >>
> > >> When using seqlevelsStyle() to change chromosome names from UCSC
> to
> > >> NCBI format, this results in new behavior -- in the past chrM was
> > >> simply renamed MT, now the different sequence chrMT is used. Is
> > this intended?
> > >
> > > Absolutely intended.
> > >
> > > There is a long story behind the unfortunate fate of the
> > mitochondrial chromosome in hg19. I'll try to keep it short.
> > >
> > > When the UCSC folks released the hg19 browser more than 10 years
> > ago, they based it on assembly GRCh37:
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
> > >
> > > See sequence report for GRCh37:
> > >
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
> > >
> > > For some mysterious reason GRCh37 didn't include the
> > mitochondrial chromosome so the UCSC folks decided to use
> > mitochondrial sequence
> > > NC_001807 and called it chrM.
> > >
> > > However, UCSC has recently decided to base hg19 on GRCh37.p13
> > instead of GRCh37. A rather surprising move after many years of hg19
> > being based on the latter.
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
> > >
> > > See sequence report for GRCh37.p13:
> > >
> > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
> > >
> > > Note that GRCh37.p13 does include the mitochondrial chromosome.
> > It's called MT in the official sequence report above and chrMT in
> hg19.
> > >
> > > At the same time the UCSC folks decided to keep chrM so now hg19
> > contains 2 mitochondrial sequences: chrM and chrMT. Previously it
> > has only one: chrM.
> > >
> > > So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and
> with
> > > seqlevelsStyle(genome) is only reflecting this. In particular
> > > seqlevelsStyle(genome) <- "NCBI" now does the following:
> > >
> > > - Rename chrMT -> MT.
> > >
> > > - chrM does NOT get renamed. There is no point in renaming
> > this sequence because it has no equivalent in GRCh37.p13.
> > >
> > > Hope this helps,
> > >
> > > H.
> > >
> > >>
> > >> Leonard
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >> _______________________________________________
> > >> Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org>
> > mailing list
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > >>
> >
> man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
> > >>
> >
> vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
> > >> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
> > >>
> > >
> > > --
> > > Hervé Pagès
> > >
> > > Program in Computational Biology
> > > Division of Public Health Sciences
> > > Fred Hutchinson Cancer Research Center
> > > 1100 Fairview Ave. N, M1-B514
> > > P.O. Box 19024
> > > Seattle, WA 98109-1024
> > >
> > > E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> > > Phone: (206) 667-5791
> > > Fax: (206) 667-1319
> > >
> > > _______________________________________________
> > > Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org>
> > mailing list
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
> > >
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> > Phone: (206) 667-5791
> > Fax: (206) 667-1319
> >
> > _______________________________________________
> > Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org> mailing
> list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=
> >
> >
> >
> >
> > --
> > Best,
> > Kasper
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list