[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
Pages, Herve
hp@ge@ @end|ng |rom |redhutch@org
Fri Apr 26 03:19:35 CEST 2019
Hi Aaron,
On 4/25/19 16:44, Aaron Lun wrote:
It doesn't seem like it - on my installation, org.Hs.eg.db is still...
monkeying around.
__
w c(..)o (
\__(-) __)
/\ (
/(_)___)
w /|
| \
m m
Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The new packages are on their way and should become available via BiocManager::install() in the next 12 hours or so.
Hopefully they'll put an end to the Great Monkey Conspiracy!
Unfortunately we won't see the effect on tomorrow's build report, only on Saturday's report.
Cheers,
H.
On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey <stvjc using channing.harvard.edu><mailto:stvjc using channing.harvard.edu>
wrote:
Has this situation been rectified?
On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
Daniel.VanTwisk using roswellpark.org<mailto:Daniel.VanTwisk using roswellpark.org>> wrote:
We've made some changes to our annotation generation scripts this release
and it seems these may have introduced some errors. Thank you for
identifying this issue and I will try to have some fixes out asap.
________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org><mailto:bioc-devel-bounces using r-project.org> on behalf of James
W. MacDonald <jmacdon using uw.edu><mailto:jmacdon using uw.edu>
Sent: Tuesday, April 23, 2019 11:03:02 AM
To: Aaron Lun
Cc: Bioc-devel
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
Looks like the ensembl table of the human.db0 package got polluted with
*Pan
troglodytes* genes:
con <- dbConnect(SQLite(),
"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
dbGetQuery(con, "select count(*) from ensembl where ensid like
'ENSPTR%';")
count(*)
1 16207
dbGetQuery(con, "select count(*) from ensembl where ensid like
'ENSG%';")
count(*)
1 28973
On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboards using gmail.com<mailto:infinite.monkeys.with.keyboards using gmail.com>> wrote:
Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?
> library(org.Hs.eg.db)
> mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
GCG
"ENSPTRG00000000777"
Well, at least it still recovers the right identifier... eventually.
> select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
SYMBOL ENSEMBL
1 GCG ENSPTRG00000000777
2 GCG ENSG00000115263
The SYMBOL->Entrez ID relational table seems to be okay:
> Y <- toTable(org.Hs.egSYMBOL)
> Y[which(Y[,2]=="GCG"),]
gene_id symbol
2152 2641 GCG
So the cause is the Ensembl->Entrez mappings:
> Z <- toTable(org.Hs.egENSEMBL2EG)
> Z[Z[,1]==2641,]
gene_id ensembl_id
3028 2641 ENSPTRG00000000777
3029 2641 ENSG00000115263
Googling suggests that ENSPTRG00000000777 is an identifier for some
other gene in one of the other monkeys. Hardly "Hs" stuff.
Session info (not technically R 3.6, but I didn't think that would have
been the cause):
R Under development (unstable) (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /home/luna/Software/R/trunk/lib/libRblas.so
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils
datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_3.8.0 AnnotationDbi_1.45.1 IRanges_2.17.5
[4] S4Vectors_0.21.23 Biobase_2.43.1 BiocGenerics_0.29.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 digest_0.6.18 DBI_1.0.0 RSQLite_2.1.1
[5] blob_1.1.1 bit64_0.9-7 bit_1.1-14 compiler_3.7.0
[9] pkgconfig_2.0.2 memoise_1.1.0
_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or
agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited. If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
The information in this e-mail is intended only for th...{{dropped:15}}
_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org<mailto:hpages using fredhutch.org>
Phone: (206) 667-5791
Fax: (206) 667-1319
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list