[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
Vincent Carey
@tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Thu Apr 25 18:17:48 CEST 2019
Has this situation been rectified?
On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
Daniel.VanTwisk using roswellpark.org> wrote:
> We've made some changes to our annotation generation scripts this release
> and it seems these may have introduced some errors. Thank you for
> identifying this issue and I will try to have some fixes out asap.
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of James W.
> MacDonald <jmacdon using uw.edu>
> Sent: Tuesday, April 23, 2019 11:03:02 AM
> To: Aaron Lun
> Cc: Bioc-devel
> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Looks like the ensembl table of the human.db0 package got polluted with
> *Pan
> troglodytes* genes:
>
> > con <- dbConnect(SQLite(),
> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
> > dbGetQuery(con, "select count(*) from ensembl where ensid like
> 'ENSPTR%';")
> count(*)
> 1 16207
> > dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';")
> count(*)
> 1 28973
>
> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
> infinite.monkeys.with.keyboards using gmail.com> wrote:
>
> > Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?
> >
> > > library(org.Hs.eg.db)
> > > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> > 'select()' returned 1:many mapping between keys and columns
> > GCG
> > "ENSPTRG00000000777"
> >
> > Well, at least it still recovers the right identifier... eventually.
> >
> > > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> > 'select()' returned 1:many mapping between keys and columns
> > SYMBOL ENSEMBL
> > 1 GCG ENSPTRG00000000777
> > 2 GCG ENSG00000115263
> >
> > The SYMBOL->Entrez ID relational table seems to be okay:
> >
> > > Y <- toTable(org.Hs.egSYMBOL)
> > > Y[which(Y[,2]=="GCG"),]
> > gene_id symbol
> > 2152 2641 GCG
> >
> > So the cause is the Ensembl->Entrez mappings:
> >
> > > Z <- toTable(org.Hs.egENSEMBL2EG)
> > > Z[Z[,1]==2641,]
> > gene_id ensembl_id
> > 3028 2641 ENSPTRG00000000777
> > 3029 2641 ENSG00000115263
> >
> > Googling suggests that ENSPTRG00000000777 is an identifier for some
> > other gene in one of the other monkeys. Hardly "Hs" stuff.
> >
> > Session info (not technically R 3.6, but I didn't think that would have
> > been the cause):
> >
> > > R Under development (unstable) (2019-04-11 r76379)
> > > Platform: x86_64-pc-linux-gnu (64-bit)
> > > Running under: Ubuntu 18.04.2 LTS
> > >
> > > Matrix products: default
> > > BLAS: /home/luna/Software/R/trunk/lib/libRblas.so
> > > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> > >
> > > locale:
> > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> > > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > >
> > > attached base packages:
> > > [1] parallel stats4 stats graphics grDevices utils
> datasets
> > > [8] methods base
> > >
> > > other attached packages:
> > > [1] org.Hs.eg.db_3.8.0 AnnotationDbi_1.45.1 IRanges_2.17.5
> > > [4] S4Vectors_0.21.23 Biobase_2.43.1 BiocGenerics_0.29.2
> > >
> > > loaded via a namespace (and not attached):
> > > [1] Rcpp_1.0.1 digest_0.6.18 DBI_1.0.0 RSQLite_2.1.1
> > > [5] blob_1.1.1 bit64_0.9-7 bit_1.1-14 compiler_3.7.0
> > > [9] pkgconfig_2.0.2 memoise_1.1.0
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
The information in this e-mail is intended only for the ...{{dropped:18}}
More information about the Bioc-devel
mailing list