[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Aaron Lun |n||n|te@monkey@@w|th@keybo@rd@ @end|ng |rom gm@||@com
Tue Apr 23 05:53:35 CEST 2019


Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?

 > library(org.Hs.eg.db)
 > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
                  GCG
"ENSPTRG00000000777"

Well, at least it still recovers the right identifier... eventually.

 > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
   SYMBOL            ENSEMBL
1    GCG ENSPTRG00000000777
2    GCG    ENSG00000115263

The SYMBOL->Entrez ID relational table seems to be okay:

 > Y <- toTable(org.Hs.egSYMBOL)
 > Y[which(Y[,2]=="GCG"),]
      gene_id symbol
2152    2641    GCG

So the cause is the Ensembl->Entrez mappings:

 > Z <- toTable(org.Hs.egENSEMBL2EG)
 > Z[Z[,1]==2641,]
      gene_id         ensembl_id
3028    2641 ENSPTRG00000000777
3029    2641    ENSG00000115263

Googling suggests that ENSPTRG00000000777 is an identifier for some 
other gene in one of the other monkeys. Hardly "Hs" stuff.

Session info (not technically R 3.6, but I didn't think that would have 
been the cause):

> R Under development (unstable) (2019-04-11 r76379)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
> 
> Matrix products: default
> BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
> [8] methods   base     
> 
> other attached packages:
> [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5      
> [4] S4Vectors_0.21.23    Biobase_2.43.1       BiocGenerics_0.29.2 
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_1.0.1      digest_0.6.18   DBI_1.0.0       RSQLite_2.1.1  
>  [5] blob_1.1.1      bit64_0.9-7     bit_1.1-14      compiler_3.7.0 
>  [9] pkgconfig_2.0.2 memoise_1.1.0



More information about the Bioc-devel mailing list