[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Aaron Lun |n||n|te@monkey@@w|th@keybo@rd@ @end|ng |rom gm@||@com
Fri Apr 26 01:44:07 CEST 2019


It doesn't seem like it - on my installation, org.Hs.eg.db is still...
monkeying around.

On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey <stvjc using channing.harvard.edu>
wrote:

> Has this situation been rectified?
>
> On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
> Daniel.VanTwisk using roswellpark.org> wrote:
>
>> We've made some changes to our annotation generation scripts this release
>> and it seems these may have introduced some errors. Thank you for
>> identifying this issue and I will try to have some fixes out asap.
>>
>> ________________________________
>> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of James
>> W. MacDonald <jmacdon using uw.edu>
>> Sent: Tuesday, April 23, 2019 11:03:02 AM
>> To: Aaron Lun
>> Cc: Bioc-devel
>> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>>
>> Looks like the ensembl table of the human.db0 package got polluted with
>> *Pan
>> troglodytes* genes:
>>
>> > con <- dbConnect(SQLite(),
>> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
>> > dbGetQuery(con, "select count(*) from ensembl where ensid like
>> 'ENSPTR%';")
>>   count(*)
>> 1    16207
>> > dbGetQuery(con, "select count(*) from ensembl where ensid like
>> 'ENSG%';")
>>   count(*)
>> 1    28973
>>
>> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
>> infinite.monkeys.with.keyboards using gmail.com> wrote:
>>
>> > Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...?
>> >
>> >  > library(org.Hs.eg.db)
>> >  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
>> > 'select()' returned 1:many mapping between keys and columns
>> >                   GCG
>> > "ENSPTRG00000000777"
>> >
>> > Well, at least it still recovers the right identifier... eventually.
>> >
>> >  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
>> > 'select()' returned 1:many mapping between keys and columns
>> >    SYMBOL            ENSEMBL
>> > 1    GCG ENSPTRG00000000777
>> > 2    GCG    ENSG00000115263
>> >
>> > The SYMBOL->Entrez ID relational table seems to be okay:
>> >
>> >  > Y <- toTable(org.Hs.egSYMBOL)
>> >  > Y[which(Y[,2]=="GCG"),]
>> >       gene_id symbol
>> > 2152    2641    GCG
>> >
>> > So the cause is the Ensembl->Entrez mappings:
>> >
>> >  > Z <- toTable(org.Hs.egENSEMBL2EG)
>> >  > Z[Z[,1]==2641,]
>> >       gene_id         ensembl_id
>> > 3028    2641 ENSPTRG00000000777
>> > 3029    2641    ENSG00000115263
>> >
>> > Googling suggests that ENSPTRG00000000777 is an identifier for some
>> > other gene in one of the other monkeys. Hardly "Hs" stuff.
>> >
>> > Session info (not technically R 3.6, but I didn't think that would have
>> > been the cause):
>> >
>> > > R Under development (unstable) (2019-04-11 r76379)
>> > > Platform: x86_64-pc-linux-gnu (64-bit)
>> > > Running under: Ubuntu 18.04.2 LTS
>> > >
>> > > Matrix products: default
>> > > BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
>> > > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
>> > >
>> > > locale:
>> > >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> > >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> > >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> > >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> > >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> > >
>> > > attached base packages:
>> > > [1] parallel  stats4    stats     graphics  grDevices utils
>>  datasets
>> > > [8] methods   base
>> > >
>> > > other attached packages:
>> > > [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
>> > > [4] S4Vectors_0.21.23    Biobase_2.43.1       BiocGenerics_0.29.2
>> > >
>> > > loaded via a namespace (and not attached):
>> > >  [1] Rcpp_1.0.1      digest_0.6.18   DBI_1.0.0       RSQLite_2.1.1
>> > >  [5] blob_1.1.1      bit64_0.9-7     bit_1.1-14      compiler_3.7.0
>> > >  [9] pkgconfig_2.0.2 memoise_1.1.0
>> >
>> > _______________________________________________
>> > Bioc-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee or
>> agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> The information in this e-mail is intended only for th...{{dropped:15}}



More information about the Bioc-devel mailing list