[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Van Twisk, Daniel D@n|e|@V@nTw|@k @end|ng |rom Ro@we||P@rk@org
Tue Apr 23 17:40:13 CEST 2019


We've made some changes to our annotation generation scripts this release and it seems these may have introduced some errors. Thank you for identifying this issue and I will try to have some fixes out asap.

________________________________
From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of James W. MacDonald <jmacdon using uw.edu>
Sent: Tuesday, April 23, 2019 11:03:02 AM
To: Aaron Lun
Cc: Bioc-devel
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Looks like the ensembl table of the human.db0 package got polluted with *Pan
troglodytes* genes:

> con <- dbConnect(SQLite(),
"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
> dbGetQuery(con, "select count(*) from ensembl where ensid like
'ENSPTR%';")
  count(*)
1    16207
> dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';")
  count(*)
1    28973

On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboards using gmail.com> wrote:

> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>                   GCG
> "ENSPTRG00000000777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>    SYMBOL            ENSEMBL
> 1    GCG ENSPTRG00000000777
> 2    GCG    ENSG00000115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>       gene_id symbol
> 2152    2641    GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>       gene_id         ensembl_id
> 3028    2641 ENSPTRG00000000777
> 3029    2641    ENSG00000115263
>
> Googling suggests that ENSPTRG00000000777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
> > R Under development (unstable) (2019-04-11 r76379)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 18.04.2 LTS
> >
> > Matrix products: default
> > BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats4    stats     graphics  grDevices utils     datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
> > [4] S4Vectors_0.21.23    Biobase_2.43.1       BiocGenerics_0.29.2
> >
> > loaded via a namespace (and not attached):
> >  [1] Rcpp_1.0.1      digest_0.6.18   DBI_1.0.0       RSQLite_2.1.1
> >  [5] blob_1.1.1      bit64_0.9-7     bit_1.1-14      compiler_3.7.0
> >  [9] pkgconfig_2.0.2 memoise_1.1.0
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list