[Bioc-devel] Question about org.Dr.eg.db package

Margolin, Gennady (NIH/NICHD) [C] genn@dy@m@rgo||n @end|ng |rom n|h@gov
Thu Aug 13 23:50:03 CEST 2020


Hi Jim,

Hi Jim,

Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has only functional annotation, which I thought it was as it did not refer to a specific genome, unlike TxDb packages, but then I found what I said in my previous emails.

Thank you very much,
Gennady

From: "James W. MacDonald" <jmacdon using uw.edu>
Reply-To: "jmacdon using u.washington.edu" <jmacdon using u.washington.edu>
Date: Thursday, August 13, 2020 at 5:41 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margolin using nih.gov>
Cc: Vincent Carey <stvjc using channing.harvard.edu>, "bioc-devel using r-project.org" <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

Hi Gennady,

That information should probably be cleaned up, and the BiMaps that point to the location data removed. While the OrgDbs do contain position information, it's been deprecated, which you would find if you tried to query using select():

> select(org.Dr.eg.db, "30037", "CHR")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID CHR
1    30037   5
Warning message:
In .deprecatedColsMessage() :
  Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
  deprecated. Please use a range based accessor like genes(), or select()
  with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
  object instead.

The rationale being that the OrgDb packages are intended to contain functional annotations, which are not based on any build, and instead are current as of the construction of the OrgDb package. Since positional information should be based on a genome release, those data have been migrated to the TxDb and EnsDb packages, which are based on a given release.

Put a different way, the data in an OrgDb package is downloaded from NCBI as of a particular date, and the positional data we get are whatever we got from NCBI on that date. This is obviously a problem for the positional data, because what we get isn't necessarily build-specific. We get the TxDb data from the UCSC Genome Browser, which is build specific, so we can tell end users exactly what build the data come from. Ideally these data would be defunct in the OrgDb packages, but it hasn't happened yet.

Best,

Jim



On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel <bioc-devel using r-project.org<mailto:bioc-devel using r-project.org>> wrote:
Hi Vincent,

Thank you for responding.

Here is from the R documentation help page from this package (I have version 3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)):

-------------------------------------------------
org.Dr.egCHRLOC {org.Dr.eg.db}
Entrez Gene IDs to Chromosomal Location
Description
org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the starting position of the gene. The position of a gene is measured as the number of base pairs.
The CHRLOCEND mapping is the same as the CHRLOC mapping except that it specifies the ending base of a gene instead of the start.
……
-------------------------------------------------

This output also does not show any genome version:
> org.Dr.eg_dbInfo()
                 name                                                             value
1     DBSCHEMAVERSION                                                               2.1
2             Db type                                                             OrgDb
3  Supporting package                                                     AnnotationDbi
4            DBSCHEMA                                                      ZEBRAFISH_DB
5            ORGANISM                                                       Danio rerio
6             SPECIES                                                         Zebrafish
7        EGSOURCEDATE                                                        2019-Jul10
8        EGSOURCENAME                                                       Entrez Gene
9         EGSOURCEURL                              ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10          CENTRALID                                                                EG
11              TAXID                                                              7955
12       GOSOURCENAME                                                     Gene Ontology
13        GOSOURCEURL ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
14       GOSOURCEDATE                                                        2019-Jul10
15     GOEGSOURCEDATE                                                        2019-Jul10
16     GOEGSOURCENAME                                                       Entrez Gene
17      GOEGSOURCEURL                              ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18     KEGGSOURCENAME                                                       KEGG GENOME
19      KEGGSOURCEURL                              ftp://ftp.genome.jp/pub/kegg/genomes
20     KEGGSOURCEDATE                                                        2011-Mar15
21       GPSOURCENAME                          UCSC Genome Bioinformatics (Danio rerio)
22        GPSOURCEURL
23       GPSOURCEDATE                                                         2017-Nov1
24       ENSOURCEDATE                                                        2019-Jun24
25       ENSOURCENAME                                                           Ensembl
26        ENSOURCEURL                           ftp://ftp.ensembl.org/pub/current_fasta
27       UPSOURCENAME                                                           Uniprot
28        UPSOURCEURL                                           http://www.UniProt.org/
29       UPSOURCEDATE                                          Mon Oct 21 14:32:30 2019

From: Vincent Carey <stvjc using channing.harvard.edu<mailto:stvjc using channing.harvard.edu>>
Date: Thursday, August 13, 2020 at 2:46 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margolin using nih.gov<mailto:gennady.margolin using nih.gov>>
Cc: "bioc-devel using r-project.org<mailto:bioc-devel using r-project.org>" <bioc-devel using r-project.org<mailto:bioc-devel using r-project.org>>
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

This should probably be posed to the support site.  What version of the package are you using?  Where
are you seeing coordinates?  I would expect those to be obtained from the TxDb package, or perhaps
from AnnotationHub.


> columns(org.Dr.eg.db)

 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"

 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"

[11] "GO"           "GOALL"        "IPI"          "ONTOLOGY"     "ONTOLOGYALL"

[16] "PATH"         "PFAM"         "PMID"         "PROSITE"      "REFSEQ"

[21] "SYMBOL"       "UNIGENE"      "UNIPROT"      "ZFIN"


On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel <bioc-devel using r-project.org<mailto:bioc-devel using r-project.org><mailto:bioc-devel using r-project.org<mailto:bioc-devel using r-project.org>>> wrote:
Hello,

I have a short question – how do I figure the genome version for org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s not in org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 or some other assembly, as there are coordinates present in the DB.

Thank you,
Gennady

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org><mailto:Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list