[Bioc-devel] BSgenome changes

Leonard Goldstein go|d@te|n@|eon@rd @end|ng |rom gene@com
Tue Aug 18 20:17:50 CEST 2020


Thanks for the explanation Hervé.

Best wishes

Leonard


On Tue, Aug 18, 2020 at 10:06 AM Hervé Pagès <hpages using fredhutch.org> wrote:

> On 8/18/20 01:40, Kasper Daniel Hansen wrote:
> > In light of this, could we get a version of GRCh37 with only a single
> > mitochondrial genome?
>
> You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would
> contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without
> the hg19:chrM sequence?
>
> Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you
> stay away from this mess. I'm not sure that adding yet another BSgenome
> package would make the situation less confusing.
>
> >
> > On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès <hpages using fredhutch.org
> > <mailto:hpages using fredhutch.org>> wrote:
> >
> >     Hi Felix,
> >
> >     On 8/13/20 21:43, Felix Ernst wrote:
> >      > Hi Leonard, Hi Herve,
> >      >
> >      > I followed your conversation, since I have noticed the same
> >     problem. Thanks, Herve, for the explanation of the recent changes on
> >     hg19.
> >      >
> >      > The GRCh37.P13 report states in its last line:
> >      >
> >      > MT    assembled-molecule      MT      Mitochondrion   J01415.2
> >          =       NC_012920.1     non-nuclear     16569   chrM
> >      >
> >      > Since the last name is called "UCSC-style-name", wouldn't that
> >     mean that chrM has to be renamed to MT and not chrMT?
> >
> >     This is a mistake in the sequence report for GRCh37.p13.
> GRCh37.p13:MT
> >     is the same as hg19:chrMT, not hg19:chrM.
> >
> >     hg19:chrM and hg19:chrMT are **not** the same sequences. The former
> is
> >     NC_001807 and has length 16571 and the latter is NC_012920.1 and has
> >     length 16569.
> >
> >     Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
> >
> >     Cheers,
> >     H.
> >
> >      >
> >      > Thanks again for the explanation.
> >      >
> >      > Cheers,
> >      > Felix
> >      >
> >      > -----Ursprüngliche Nachricht-----
> >      > Von: Bioc-devel <bioc-devel-bounces using r-project.org
> >     <mailto:bioc-devel-bounces using r-project.org>> Im Auftrag von Hervé
> Pagès
> >      > Gesendet: Freitag, 14. August 2020 01:08
> >      > An: Leonard Goldstein <goldstein.leonard using gene.com
> >     <mailto:goldstein.leonard using gene.com>>; bioc-devel using r-project.org
> >     <mailto:bioc-devel using r-project.org>
> >      > Cc: charlotte.soneson using fmi.ch <mailto:charlotte.soneson using fmi.ch>
> >      > Betreff: Re: [Bioc-devel] BSgenome changes
> >      >
> >      > Hi Leonard,
> >      >
> >      > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
> >      >> Dear Bioc team,
> >      >>
> >      >> I'm following up on this recent GitHub issue
> >      >>
> >     <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
> >      >>
> >
>  _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
> >      >. Please see the issue for more details and code examples.
> >      >>
> >      >> It looks like changes in Bioc devel result in two copies of the
> >      >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one
> >     named
> >      >> chrM like in previous package versions (length 16571) and one
> named
> >      >> chrMT (length 16569).
> >      >>
> >      >> When using seqlevelsStyle() to change chromosome names from UCSC
> to
> >      >> NCBI format, this results in new behavior -- in the past chrM was
> >      >> simply renamed MT, now the different sequence chrMT is used. Is
> >     this intended?
> >      >
> >      > Absolutely intended.
> >      >
> >      > There is a long story behind the unfortunate fate of the
> >     mitochondrial chromosome in hg19. I'll try to keep it short.
> >      >
> >      > When the UCSC folks released the hg19 browser more than 10 years
> >     ago, they based it on assembly GRCh37:
> >      >
> >      >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
> >      >
> >      > See sequence report for GRCh37:
> >      >
> >      >
> >      >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
> >      >
> >      > For some mysterious reason GRCh37 didn't include the
> >     mitochondrial chromosome so the UCSC folks decided to use
> >     mitochondrial sequence
> >      > NC_001807 and called it chrM.
> >      >
> >      > However, UCSC has recently decided to base hg19 on GRCh37.p13
> >     instead of GRCh37. A rather surprising move after many years of hg19
> >     being based on the latter.
> >      >
> >      >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
> >      >
> >      > See sequence report for GRCh37.p13:
> >      >
> >      >
> >      >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
> >      >
> >      > Note that GRCh37.p13 does include the mitochondrial chromosome.
> >     It's called MT in the official sequence report above and chrMT in
> hg19.
> >      >
> >      > At the same time the UCSC folks decided to keep chrM so now hg19
> >     contains 2 mitochondrial sequences: chrM and chrMT. Previously it
> >     has only one: chrM.
> >      >
> >      > So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and
> with
> >      > seqlevelsStyle(genome) is only reflecting this. In particular
> >      > seqlevelsStyle(genome) <- "NCBI" now does the following:
> >      >
> >      >     - Rename chrMT -> MT.
> >      >
> >      >     - chrM does NOT get renamed. There is no point in renaming
> >     this sequence because it has no equivalent in GRCh37.p13.
> >      >
> >      > Hope this helps,
> >      >
> >      > H.
> >      >
> >      >>
> >      >> Leonard
> >      >>
> >      >>      [[alternative HTML version deleted]]
> >      >>
> >      >> _______________________________________________
> >      >> Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org>
> >     mailing list
> >      >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> >      >>
> >
>  man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
> >      >>
> >
>  vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
> >      >> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
> >      >>
> >      >
> >      > --
> >      > Hervé Pagès
> >      >
> >      > Program in Computational Biology
> >      > Division of Public Health Sciences
> >      > Fred Hutchinson Cancer Research Center
> >      > 1100 Fairview Ave. N, M1-B514
> >      > P.O. Box 19024
> >      > Seattle, WA 98109-1024
> >      >
> >      > E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> >      > Phone:  (206) 667-5791
> >      > Fax:    (206) 667-1319
> >      >
> >      > _______________________________________________
> >      > Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org>
> >     mailing list
> >      >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
> >      >
> >
> >     --
> >     Hervé Pagès
> >
> >     Program in Computational Biology
> >     Division of Public Health Sciences
> >     Fred Hutchinson Cancer Research Center
> >     1100 Fairview Ave. N, M1-B514
> >     P.O. Box 19024
> >     Seattle, WA 98109-1024
> >
> >     E-mail: hpages using fredhutch.org <mailto:hpages using fredhutch.org>
> >     Phone:  (206) 667-5791
> >     Fax:    (206) 667-1319
> >
> >     _______________________________________________
> >     Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org> mailing
> list
> >     https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >     <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e=
> >
> >
> >
> >
> > --
> > Best,
> > Kasper
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list