[Bioc-devel] Question about org.Dr.eg.db package

James W. MacDonald jm@cdon @end|ng |rom uw@edu
Fri Aug 14 00:00:15 CEST 2020


Glad to help!

On Thu, Aug 13, 2020 at 5:51 PM Margolin, Gennady (NIH/NICHD) [C] <
gennady.margolin using nih.gov> wrote:

> Hi Jim,
>
>
>
> Hi Jim,
>
>
>
> Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has
> only functional annotation, which I thought it was as it did not refer to a
> specific genome, unlike TxDb packages, but then I found what I said in my
> previous emails.
>
>
>
> Thank you very much,
>
> Gennady
>
>
>
> *From: *"James W. MacDonald" <jmacdon using uw.edu>
> *Reply-To: *"jmacdon using u.washington.edu" <jmacdon using u.washington.edu>
> *Date: *Thursday, August 13, 2020 at 5:41 PM
> *To: *"Margolin, Gennady (NIH/NICHD) [C]" <gennady.margolin using nih.gov>
> *Cc: *Vincent Carey <stvjc using channing.harvard.edu>, "
> bioc-devel using r-project.org" <bioc-devel using r-project.org>
> *Subject: *Re: [Bioc-devel] Question about org.Dr.eg.db package
>
>
>
> Hi Gennady,
>
>
>
> That information should probably be cleaned up, and the BiMaps that point
> to the location data removed. While the OrgDbs do contain position
> information, it's been deprecated, which you would find if you tried to
> query using select():
>
>
>
> > select(org.Dr.eg.db, "30037", "CHR")
> 'select()' returned 1:1 mapping between keys and columns
>   ENTREZID CHR
> 1    30037   5
> Warning message:
> In .deprecatedColsMessage() :
>   Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
>   deprecated. Please use a range based accessor like genes(), or select()
>   with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
>   object instead.
>
>
>
> The rationale being that the OrgDb packages are intended to contain
> functional annotations, which are not based on any build, and instead are
> current as of the construction of the OrgDb package. Since positional
> information should be based on a genome release, those data have been
> migrated to the TxDb and EnsDb packages, which are based on a given release.
>
>
>
> Put a different way, the data in an OrgDb package is downloaded from NCBI
> as of a particular date, and the positional data we get are whatever we got
> from NCBI on that date. This is obviously a problem for the positional
> data, because what we get isn't necessarily build-specific. We get the TxDb
> data from the UCSC Genome Browser, which is build specific, so we can tell
> end users exactly what build the data come from. Ideally these data would
> be defunct in the OrgDb packages, but it hasn't happened yet.
>
>
>
> Best,
>
>
>
> Jim
>
>
>
>
>
>
>
> On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via
> Bioc-devel <bioc-devel using r-project.org> wrote:
>
> Hi Vincent,
>
> Thank you for responding.
>
> Here is from the R documentation help page from this package (I have
> version 3.10.0 (I doubt anything changed with the latest one, which is
> 3.11.4)):
>
> -------------------------------------------------
> org.Dr.egCHRLOC {org.Dr.eg.db}
> Entrez Gene IDs to Chromosomal Location
> Description
> org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the
> starting position of the gene. The position of a gene is measured as the
> number of base pairs.
> The CHRLOCEND mapping is the same as the CHRLOC mapping except that it
> specifies the ending base of a gene instead of the start.
> ……
> -------------------------------------------------
>
> This output also does not show any genome version:
> > org.Dr.eg_dbInfo()
>                  name
>        value
> 1     DBSCHEMAVERSION
>          2.1
> 2             Db type
>        OrgDb
> 3  Supporting package
>  AnnotationDbi
> 4            DBSCHEMA
> ZEBRAFISH_DB
> 5            ORGANISM
>  Danio rerio
> 6             SPECIES
>    Zebrafish
> 7        EGSOURCEDATE
>   2019-Jul10
> 8        EGSOURCENAME
>  Entrez Gene
> 9         EGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 10          CENTRALID
>           EG
> 11              TAXID
>         7955
> 12       GOSOURCENAME
>  Gene Ontology
> 13        GOSOURCEURL
> ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
> 14       GOSOURCEDATE
>   2019-Jul10
> 15     GOEGSOURCEDATE
>   2019-Jul10
> 16     GOEGSOURCENAME
>  Entrez Gene
> 17      GOEGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 18     KEGGSOURCENAME
>  KEGG GENOME
> 19      KEGGSOURCEURL
> ftp://ftp.genome.jp/pub/kegg/genomes
> 20     KEGGSOURCEDATE
>   2011-Mar15
> 21       GPSOURCENAME                          UCSC Genome Bioinformatics
> (Danio rerio)
> 22        GPSOURCEURL
> 23       GPSOURCEDATE
>    2017-Nov1
> 24       ENSOURCEDATE
>   2019-Jun24
> 25       ENSOURCENAME
>      Ensembl
> 26        ENSOURCEURL
> ftp://ftp.ensembl.org/pub/current_fasta
> 27       UPSOURCENAME
>      Uniprot
> 28        UPSOURCEURL
> http://www.UniProt.org/
> 29       UPSOURCEDATE                                          Mon Oct 21
> 14:32:30 2019
>
> From: Vincent Carey <stvjc using channing.harvard.edu>
> Date: Thursday, August 13, 2020 at 2:46 PM
> To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margolin using nih.gov>
> Cc: "bioc-devel using r-project.org" <bioc-devel using r-project.org>
> Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package
>
> This should probably be posed to the support site.  What version of the
> package are you using?  Where
> are you seeing coordinates?  I would expect those to be obtained from the
> TxDb package, or perhaps
> from AnnotationHub.
>
>
> > columns(org.Dr.eg.db)
>
>  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"
> "ENSEMBLTRANS"
>
>  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"
>
> [11] "GO"           "GOALL"        "IPI"          "ONTOLOGY"
>  "ONTOLOGYALL"
>
> [16] "PATH"         "PFAM"         "PMID"         "PROSITE"      "REFSEQ"
>
> [21] "SYMBOL"       "UNIGENE"      "UNIPROT"      "ZFIN"
>
>
> On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via
> Bioc-devel <bioc-devel using r-project.org<mailto:bioc-devel using r-project.org>>
> wrote:
> Hello,
>
> I have a short question – how do I figure the genome version for
> org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s
> not in org.Dr.eg_dbInfo() output. It would be nice to know if this is
> danRer11/GRCz11 or some other assembly, as there are coordinates present in
> the DB.
>
> Thank you,
> Gennady
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org<mailto:Bioc-devel using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> --
>
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list