[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Aaron Lun |n||n|te@monkey@@w|th@keybo@rd@ @end|ng |rom gm@||@com
Sat Apr 27 01:21:52 CEST 2019


Thanks Daniel. Glad to see the end of that monkey business, my analyses
were going bananas.

On Fri, Apr 26, 2019 at 3:41 PM Van Twisk, Daniel <
Daniel.VanTwisk using roswellpark.org> wrote:

> I've pushed new 3.8.2 orgdbs that should propagate soon. They do not have
> this issue.
> ------------------------------
> *From:* Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Pages,
> Herve <hpages using fredhutch.org>
> *Sent:* Thursday, April 25, 2019 9:19:35 PM
> *To:* Aaron Lun; Vincent Carey
> *Cc:* Bioc-devel; jmacdon using u.washington.edu
> *Subject:* Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Hi Aaron,
>
> On 4/25/19 16:44, Aaron Lun wrote:
>
> It doesn't seem like it - on my installation, org.Hs.eg.db is still...
> monkeying around.
>
>
>           __
>      w  c(..)o   (
>       \__(-)    __)
>           /\   (
>          /(_)___)
>          w /|
>           | \
>          m  m
>
> Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The
> new packages are on their way and should become available via
> BiocManager::install() in the next 12 hours or so.
>
> Hopefully they'll put an end to the Great Monkey Conspiracy!
>
> Unfortunately we won't see the effect on tomorrow's build report, only on
> Saturday's report.
>
> Cheers,
>
> H.
>
>
>
>
>
> On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey <stvjc using channing.harvard.edu
> ><mailto:stvjc using channing.harvard.edu>
> wrote:
>
>
>
> Has this situation been rectified?
>
> On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
> Daniel.VanTwisk using roswellpark.org<mailto:Daniel.VanTwisk using roswellpark.org>>
> wrote:
>
>
>
> We've made some changes to our annotation generation scripts this release
> and it seems these may have introduced some errors. Thank you for
> identifying this issue and I will try to have some fixes out asap.
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces using r-project.org><mailto:
> bioc-devel-bounces using r-project.org> on behalf of James
> W. MacDonald <jmacdon using uw.edu><mailto:jmacdon using uw.edu>
> Sent: Tuesday, April 23, 2019 11:03:02 AM
> To: Aaron Lun
> Cc: Bioc-devel
> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Looks like the ensembl table of the human.db0 package got polluted with
> *Pan
> troglodytes* genes:
>
>
>
> con <- dbConnect(SQLite(),
>
>
> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSPTR%';")
>   count(*)
> 1    16207
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSG%';")
>   count(*)
> 1    28973
>
> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
> infinite.monkeys.with.keyboards using gmail.com<mailto:
> infinite.monkeys.with.keyboards using gmail.com>> wrote:
>
>
>
> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>                   GCG
> "ENSPTRG00000000777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>    SYMBOL            ENSEMBL
> 1    GCG ENSPTRG00000000777
> 2    GCG    ENSG00000115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>       gene_id symbol
> 2152    2641    GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>       gene_id         ensembl_id
> 3028    2641 ENSPTRG00000000777
> 3029    2641    ENSG00000115263
>
> Googling suggests that ENSPTRG00000000777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
>
>
> R Under development (unstable) (2019-04-11 r76379)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> Matrix products: default
> BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils
>
>
>  datasets
>
>
> [8] methods   base
>
> other attached packages:
> [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
> [4] S4Vectors_0.21.23    Biobase_2.43.1       BiocGenerics_0.29.2
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_1.0.1      digest_0.6.18   DBI_1.0.0       RSQLite_2.1.1
>  [5] blob_1.1.1      bit64_0.9-7     bit_1.1-14      compiler_3.7.0
>  [9] pkgconfig_2.0.2 memoise_1.1.0
>
>
>
> _______________________________________________
> Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
>
>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
>
>
>
>
> The information in this e-mail is intended only for th...{{dropped:15}}
>
>
>
> _______________________________________________
> Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org<mailto:hpages using fredhutch.org>
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list