[Bioc-devel] BSgenome changes

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Tue Aug 18 10:40:58 CEST 2020


In light of this, could we get a version of GRCh37 with only a single
mitochondrial genome?

On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès <hpages using fredhutch.org> wrote:

> Hi Felix,
>
> On 8/13/20 21:43, Felix Ernst wrote:
> > Hi Leonard, Hi Herve,
> >
> > I followed your conversation, since I have noticed the same problem.
> Thanks, Herve, for the explanation of the recent changes on hg19.
> >
> > The GRCh37.P13 report states in its last line:
> >
> > MT    assembled-molecule      MT      Mitochondrion   J01415.2        =
>      NC_012920.1     non-nuclear     16569   chrM
> >
> > Since the last name is called "UCSC-style-name", wouldn't that mean that
> chrM has to be renamed to MT and not chrMT?
>
> This is a mistake in the sequence report for GRCh37.p13. GRCh37.p13:MT
> is the same as hg19:chrMT, not hg19:chrM.
>
> hg19:chrM and hg19:chrMT are **not** the same sequences. The former is
> NC_001807 and has length 16571 and the latter is NC_012920.1 and has
> length 16569.
>
> Yes, seqlevelsStyle() is sorting out all this mess for you ;-)
>
> Cheers,
> H.
>
> >
> > Thanks again for the explanation.
> >
> > Cheers,
> > Felix
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Bioc-devel <bioc-devel-bounces using r-project.org> Im Auftrag von Hervé
> Pagès
> > Gesendet: Freitag, 14. August 2020 01:08
> > An: Leonard Goldstein <goldstein.leonard using gene.com>;
> bioc-devel using r-project.org
> > Cc: charlotte.soneson using fmi.ch
> > Betreff: Re: [Bioc-devel] BSgenome changes
> >
> > Hi Leonard,
> >
> > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote:
> >> Dear Bioc team,
> >>
> >> I'm following up on this recent GitHub issue
> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21
> >>
> _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e=
> >. Please see the issue for more details and code examples.
> >>
> >> It looks like changes in Bioc devel result in two copies of the
> >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one named
> >> chrM like in previous package versions (length 16571) and one named
> >> chrMT (length 16569).
> >>
> >> When using seqlevelsStyle() to change chromosome names from UCSC to
> >> NCBI format, this results in new behavior -- in the past chrM was
> >> simply renamed MT, now the different sequence chrMT is used. Is this
> intended?
> >
> > Absolutely intended.
> >
> > There is a long story behind the unfortunate fate of the mitochondrial
> chromosome in hg19. I'll try to keep it short.
> >
> > When the UCSC folks released the hg19 browser more than 10 years ago,
> they based it on assembly GRCh37:
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e=
> >
> > See sequence report for GRCh37:
> >
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e=
> >
> > For some mysterious reason GRCh37 didn't include the mitochondrial
> chromosome so the UCSC folks decided to use mitochondrial sequence
> > NC_001807 and called it chrM.
> >
> > However, UCSC has recently decided to base hg19 on GRCh37.p13 instead of
> GRCh37. A rather surprising move after many years of hg19 being based on
> the latter.
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e=
> >
> > See sequence report for GRCh37.p13:
> >
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e=
> >
> > Note that GRCh37.p13 does include the mitochondrial chromosome. It's
> called MT in the official sequence report above and chrMT in hg19.
> >
> > At the same time the UCSC folks decided to keep chrM so now hg19
> contains 2 mitochondrial sequences: chrM and chrMT. Previously it has only
> one: chrM.
> >
> > So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and with
> > seqlevelsStyle(genome) is only reflecting this. In particular
> > seqlevelsStyle(genome) <- "NCBI" now does the following:
> >
> >     - Rename chrMT -> MT.
> >
> >     - chrM does NOT get renamed. There is no point in renaming this
> sequence because it has no equivalent in GRCh37.p13.
> >
> > Hope this helps,
> >
> > H.
> >
> >>
> >> Leonard
> >>
> >>      [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel using r-project.org mailing list
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> >> man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA
> >> vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv
> >> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e=
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages using fredhutch.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list