[Bioc-devel] annotation data not updated?

James W. MacDonald jmacdon at uw.edu
Wed Nov 15 17:11:07 CET 2017


On Wed, Nov 15, 2017 at 7:50 AM, Shepherd, Lori <
Lori.Shepherd at roswellpark.org> wrote:

> When this issue was brought up I updated the files that were downloaded
> when using AnnotationHub so they should be updated as well.
>

Thanks. How are the OrgDb files for AnnotationHub built? I just made one
for Salmo salar using makeOrgPackageFromNCBI, and the GO IDs for that
package match those in GO.db. One of the GO IDs in the AnnotationHub OrgDb
for Salmo salar (that is not in GO.db) is GO:0044744, which was made a
secondary ID for GO:0034504 on 6/29/2017, which seems too far in the past
to have not been picked up by an update in November.

If I just pick another OrgDb at random, it has outdated GO IDs as well:

>  query(hub, c("macaca","orgdb"))
AnnotationHub with 3 records
# snapshotDate(): 2017-10-27
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Macaca cynomolgus, Macaca mulatta, Macaca nemestrina
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH57977"]]'

            title
  AH57977 | org.Mmu.eg.db.sqlite
  AH58035 | org.Macaca_nemestrina.eg.sqlite
  AH58053 | org.Macaca_cynomolgus.eg.sqlite

> z <- hub[["AH58035"]]
downloading from  https://annotationhub.bioconductor.org/fetch/64781
retrieving 1 resource
  |======================================================================|
100%

> sum(!keys(z, "GOALL") %in% keys(GO.db))
[1] 13

> keys(z, "GOALL")[!keys(z, "GOALL") %in% keys(GO.db)]
 [1] "GO:0007067" "GO:0016337" "GO:0044699" "GO:0044700" "GO:0044702"
 [6] "GO:0044707" "GO:0044710" "GO:0044711" "GO:0044763" "GO:0044765"
[11] "GO:0044767" "GO:0098602" "GO:1902578"

So far as I can tell, all of these terms have been replaced, so it looks
like the GO source date were outdated?

Jim



>
> The files were updated but the rdatadateadded was not updated when I added
> the new files.
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of James W.
> MacDonald <jmacdon at uw.edu>
> Sent: Tuesday, November 14, 2017 7:54:54 PM
> To: Van Twisk, Daniel
> Cc: bioc-devel; Yu, Guangchuang
> Subject: Re: [Bioc-devel] annotation data not updated?
>
> On Thu, Nov 9, 2017 at 9:48 AM, Van Twisk, Daniel <
> Daniel.VanTwisk at roswellpark.org> wrote:
>
> > Thanks for looking into this.  New versions of the OrgDbs and Db0s
> > (v3.5.0) are now available that have up-to-date resources.  Here is the
> > output of the new org.Hs.eg.db
>
>
> Does this issue affect the OrgDbs on AnnotationHub as well? I am finding
> e.g., that the OrgDb for Salmo salar contains GO IDs that no longer exist
> in GO.db.
>
> > zz
> OrgDb object:
> | DBSCHEMAVERSION: 2.1
> | DBSCHEMA: NOSCHEMA_DB
> | ORGANISM: Salmo salar
> | SPECIES: Salmo salar
> | CENTRALID: GID
> | Taxonomy ID: 8030
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
>
> Please see: help('select') for usage information
> > sum(!keys(zz, "GOALL") %in% keys(GO.db))
> [1] 38
>
> But this isn't true of, for example, the Homo sapiens OrgDb from
> AnnotationHub
>
> > z
> OrgDb object:
> | DBSCHEMAVERSION: 2.1
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
> | DBSCHEMA: HUMAN_DB
> | ORGANISM: Homo sapiens
> | SPECIES: Human
> | EGSOURCEDATE: 2017-Nov6
> | EGSOURCENAME: Entrez Gene
> | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | CENTRALID: EG
> | TAXID: 9606
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL:
> ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
> | GOSOURCEDATE: 2017-Nov01
> | GOEGSOURCEDATE: 2017-Nov6
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | KEGGSOURCENAME: KEGG GENOME
> | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
> | KEGGSOURCEDATE: 2011-Mar15
> | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
> | GPSOURCEURL:
> | GPSOURCEDATE: 2017-Oct9
> | ENSOURCEDATE: 2017-Aug23
> | ENSOURCENAME: Ensembl
> | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
> | UPSOURCENAME: Uniprot
> | UPSOURCEURL: http://www.UniProt.org/
> | UPSOURCEDATE: Tue Nov  7 20:57:02 2017
>
> Please see: help('select') for usage information
> > sum(!keys(z, "GOALL") %in% keys(GO.db))
> [1] 0
>
>
> But I am not sure when they were added, because the human OrgDb has an
> rdatadateadded that is obviously not correct, since it precedes the
> SOURCEDATEs from the OrgDb itself!
>
> > mcols(hub["AH57973"])$rdatadateadded  <------ Human
> [1] "2017-10-23"
> > mcols(hub["AH58003"])$rdatadateadded  <------  Salmo
> [1] "2017-10-27"
>
> Best,
>
> Jim
>
>
>
>
>
> >
> > > x <- org.Hs.eg.db
> > > x
> > OrgDb object:
> > | DBSCHEMAVERSION: 2.1
> > | Db type: OrgDb
> > | Supporting package: AnnotationDbi
> > | DBSCHEMA: HUMAN_DB
> > | ORGANISM: Homo sapiens
> > | SPECIES: Human
> > | EGSOURCEDATE: 2017-Nov6
> > | EGSOURCENAME: Entrez Gene
> > | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> > | CENTRALID: EG
> > | TAXID: 9606
> > | GOSOURCENAME: Gene Ontology
> > | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> > latest-lite/
> > | GOSOURCEDATE: 2017-Nov01
> > | GOEGSOURCEDATE: 2017-Nov6
> > | GOEGSOURCENAME: Entrez Gene
> > | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> > | KEGGSOURCENAME: KEGG GENOME
> > | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
> > | KEGGSOURCEDATE: 2011-Mar15
> > | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
> > | GPSOURCEURL:
> > | GPSOURCEDATE: 2017-Oct9
> > | ENSOURCEDATE: 2017-Aug23
> > | ENSOURCENAME: Ensembl
> > | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
> > | UPSOURCENAME: Uniprot
> > | UPSOURCEURL: http://www.UniProt.org/
> > | UPSOURCEDATE: Tue Nov  7 20:57:02 2017
> >
> >
> > ________________________________
> > From: Bioc-devel <bioc-devel-bounces at r-project.org> on behalf of
> > Obenchain, Valerie <Valerie.Obenchain at RoswellPark.org>
> > Sent: Thursday, November 2, 2017 12:47:43 PM
> > To: Yu, Guangchuang; bioc-devel
> > Subject: Re: [Bioc-devel] annotation data not updated?
> >
> > Guangchuang,
> >
> > Thanks for reporting this. We've looked into it and there is indeed a
> more
> > recent version of the data. Daniel is working on re-generating the db0
> and
> > OrgDb packages. We'll post back with more information when the packages
> are
> > ready.
> >
> > Valerie
> >
> >
> > On 11/02/2017 05:40 AM, Yu, Guangchuang wrote:
> >
> > Dear all,
> >
> > I just upgraded BioC to 3.6 and found that the data source of
> org.Hs.eg.db
> > and GO.db is still half year ago.
> >
> > I was wondering whether these packages had been updated in current
> release.
> >
> >
> >
> > org.Hs.eg.db
> >
> >
> > OrgDb object:
> > | DBSCHEMAVERSION: 2.1
> > | Db type: OrgDb
> > | Supporting package: AnnotationDbi
> > | DBSCHEMA: HUMAN_DB
> > | ORGANISM: Homo sapiens
> > | SPECIES: Human
> > | EGSOURCEDATE: *2017-Mar29*
> > | EGSOURCENAME: Entrez Gene
> > | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> > | CENTRALID: EG
> > | TAXID: 9606
> > | GOSOURCENAME: Gene Ontology
> > | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> > latest-lite/
> > | GOSOURCEDATE: *2017-Mar29*
> > | GOEGSOURCEDATE: 2017-Mar29
> > | GOEGSOURCENAME: Entrez Gene
> > | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> > | KEGGSOURCENAME: KEGG GENOME
> > | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
> > | KEGGSOURCEDATE: 2011-Mar15
> > | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
> > | GPSOURCEURL:
> > | GPSOURCEDATE: 2017-Sep7
> > | ENSOURCEDATE: 2017-Mar29
> > | ENSOURCENAME: Ensembl
> > | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
> > | UPSOURCENAME: Uniprot
> > | UPSOURCEURL: http://www.UniProt.org/
> > | UPSOURCEDATE: Thu Oct  5 16:07:33 2017
> >
> > Please see: help('select') for usage information
> >
> >
> > GO.db
> >
> >
> > GODb object:
> > | GOSOURCENAME: Gene Ontology
> > | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> > latest-lite/
> > | GOSOURCEDATE: *2017-Mar29*
> > | Db type: GODb
> > | package: AnnotationDbi
> > | DBSCHEMA: GO_DB
> > | GOEGSOURCEDATE: 2017-Mar29
> > | GOEGSOURCENAME: Entrez Gene
> > | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> > | DBSCHEMAVERSION: 2.1
> >
> > Please see: help('select') for usage information
> >
> >
> > sessionInfo()
> >
> >
> > R version 3.4.2 (2017-09-28)
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> > Running under: macOS Sierra 10.12.6
> >
> > Matrix products: default
> > BLAS: /Library/Frameworks/R.framework/Versions/3.4/
> > Resources/lib/libRblas.0.dylib
> > LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
> > Resources/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] parallel  stats4    stats     graphics  grDevices utils     datasets
> > [8] methods   base
> >
> > other attached packages:
> >  [1] org.Hs.eg.db_3.4.2   GO.db_3.4.2          AnnotationDbi_1.40.0
> >  [4] IRanges_2.12.0       S4Vectors_0.16.0     Biobase_2.38.0
> >  [7] BiocGenerics_0.24.0  rvcheck_0.0.9        rmarkdown_1.6
> > [10] roxygen2_6.0.1       magrittr_1.5         BiocInstaller_1.28.0
> >
> > loaded via a namespace (and not attached):
> >  [1] Rcpp_0.12.13    knitr_1.17      xml2_1.1.1      bit_1.1-12
> >  [5] R6_2.2.2        rlang_0.1.2     blob_1.1.0      stringr_1.2.0
> >  [9] tools_3.4.2     DBI_0.7         htmltools_0.3.6 commonmark_1.4
> > [13] bit64_0.9-7     rprojroot_1.2   digest_0.6.12   tibble_1.3.4
> > [17] memoise_1.1.0   RSQLite_2.0     evaluate_0.10.1 stringi_1.1.5
> > [21] compiler_3.4.2  backports_1.1.1 pkgconfig_2.0.1
> >
> >
> >
> >
> >
> >
> > ​
> >
> >
> >
> >
> > This email message may contain legally privileged and/or confidential
> > information.  If you are not the intended recipient(s), or the employee
> or
> > agent responsible for the delivery of this message to the intended
> > recipient(s), you are hereby notified that any disclosure, copying,
> > distribution, or use of this email message is prohibited.  If you have
> > received this message in error, please notify the sender immediately by
> > e-mail and delete this email message from your computer. Thank you.
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> > This email message may contain legally privileged and/or confidential
> > information.  If you are not the intended recipient(s), or the employee
> or
> > agent responsible for the delivery of this message to the intended
> > recipient(s), you are hereby notified that any disclosure, copying,
> > distribution, or use of this email message is prohibited.  If you have
> > received this message in error, please notify the sender immediately by
> > e-mail and delete this email message from your computer. Thank you.
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list