[Bioc-devel] annotation data not updated?
Shepherd, Lori
Lori.Shepherd at RoswellPark.org
Fri Nov 17 19:34:01 CET 2017
I believe this now should be corrected. I had updated standard org but not the additional orgdb. Please let me know if there seems to be any other issues.
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
________________________________
From: James W. MacDonald <jmacdon at uw.edu>
Sent: Wednesday, November 15, 2017 11:11:07 AM
To: Shepherd, Lori
Cc: Van Twisk, Daniel; bioc-devel; Yu, Guangchuang
Subject: Re: [Bioc-devel] annotation data not updated?
On Wed, Nov 15, 2017 at 7:50 AM, Shepherd, Lori <Lori.Shepherd at roswellpark.org<mailto:Lori.Shepherd at roswellpark.org>> wrote:
When this issue was brought up I updated the files that were downloaded when using AnnotationHub so they should be updated as well.
Thanks. How are the OrgDb files for AnnotationHub built? I just made one for Salmo salar using makeOrgPackageFromNCBI, and the GO IDs for that package match those in GO.db. One of the GO IDs in the AnnotationHub OrgDb for Salmo salar (that is not in GO.db) is GO:0044744, which was made a secondary ID for GO:0034504 on 6/29/2017, which seems too far in the past to have not been picked up by an update in November.
If I just pick another OrgDb at random, it has outdated GO IDs as well:
> query(hub, c("macaca","orgdb"))
AnnotationHub with 3 records
# snapshotDate(): 2017-10-27
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Macaca cynomolgus, Macaca mulatta, Macaca nemestrina
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH57977"]]'
title
AH57977 | org.Mmu.eg.db.sqlite
AH58035 | org.Macaca_nemestrina.eg.sqlite
AH58053 | org.Macaca_cynomolgus.eg.sqlite
> z <- hub[["AH58035"]]
downloading from https://annotationhub.bioconductor.org/fetch/64781
retrieving 1 resource
|======================================================================| 100%
> sum(!keys(z, "GOALL") %in% keys(GO.db))
[1] 13
> keys(z, "GOALL")[!keys(z, "GOALL") %in% keys(GO.db)]
[1] "GO:0007067" "GO:0016337" "GO:0044699" "GO:0044700" "GO:0044702"
[6] "GO:0044707" "GO:0044710" "GO:0044711" "GO:0044763" "GO:0044765"
[11] "GO:0044767" "GO:0098602" "GO:1902578"
So far as I can tell, all of these terms have been replaced, so it looks like the GO source date were outdated?
Jim
The files were updated but the rdatadateadded was not updated when I added the new files.
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
________________________________
From: Bioc-devel <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>> on behalf of James W. MacDonald <jmacdon at uw.edu<mailto:jmacdon at uw.edu>>
Sent: Tuesday, November 14, 2017 7:54:54 PM
To: Van Twisk, Daniel
Cc: bioc-devel; Yu, Guangchuang
Subject: Re: [Bioc-devel] annotation data not updated?
On Thu, Nov 9, 2017 at 9:48 AM, Van Twisk, Daniel <
Daniel.VanTwisk at roswellpark.org<mailto:Daniel.VanTwisk at roswellpark.org>> wrote:
> Thanks for looking into this. New versions of the OrgDbs and Db0s
> (v3.5.0) are now available that have up-to-date resources. Here is the
> output of the new org.Hs.eg.db
Does this issue affect the OrgDbs on AnnotationHub as well? I am finding
e.g., that the OrgDb for Salmo salar contains GO IDs that no longer exist
in GO.db.
> zz
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Salmo salar
| SPECIES: Salmo salar
| CENTRALID: GID
| Taxonomy ID: 8030
| Db type: OrgDb
| Supporting package: AnnotationDbi
Please see: help('select') for usage information
> sum(!keys(zz, "GOALL") %in% keys(GO.db))
[1] 38
But this isn't true of, for example, the Homo sapiens OrgDb from
AnnotationHub
> z
OrgDb object:
| DBSCHEMAVERSION: 2.1
| Db type: OrgDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMAN_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| EGSOURCEDATE: 2017-Nov6
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: EG
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL:
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
| GOSOURCEDATE: 2017-Nov01
| GOEGSOURCEDATE: 2017-Nov6
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL:
| GPSOURCEDATE: 2017-Oct9
| ENSOURCEDATE: 2017-Aug23
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Tue Nov 7 20:57:02 2017
Please see: help('select') for usage information
> sum(!keys(z, "GOALL") %in% keys(GO.db))
[1] 0
But I am not sure when they were added, because the human OrgDb has an
rdatadateadded that is obviously not correct, since it precedes the
SOURCEDATEs from the OrgDb itself!
> mcols(hub["AH57973"])$rdatadateadded <------ Human
[1] "2017-10-23"
> mcols(hub["AH58003"])$rdatadateadded <------ Salmo
[1] "2017-10-27"
Best,
Jim
>
> > x <- org.Hs.eg.db
> > x
> OrgDb object:
> | DBSCHEMAVERSION: 2.1
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
> | DBSCHEMA: HUMAN_DB
> | ORGANISM: Homo sapiens
> | SPECIES: Human
> | EGSOURCEDATE: 2017-Nov6
> | EGSOURCENAME: Entrez Gene
> | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | CENTRALID: EG
> | TAXID: 9606
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> latest-lite/
> | GOSOURCEDATE: 2017-Nov01
> | GOEGSOURCEDATE: 2017-Nov6
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | KEGGSOURCENAME: KEGG GENOME
> | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
> | KEGGSOURCEDATE: 2011-Mar15
> | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
> | GPSOURCEURL:
> | GPSOURCEDATE: 2017-Oct9
> | ENSOURCEDATE: 2017-Aug23
> | ENSOURCENAME: Ensembl
> | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
> | UPSOURCENAME: Uniprot
> | UPSOURCEURL: http://www.UniProt.org/
> | UPSOURCEDATE: Tue Nov 7 20:57:02 2017
>
>
> ________________________________
> From: Bioc-devel <bioc-devel-bounces at r-project.org<mailto:bioc-devel-bounces at r-project.org>> on behalf of
> Obenchain, Valerie <Valerie.Obenchain at RoswellPark.org>
> Sent: Thursday, November 2, 2017 12:47:43 PM
> To: Yu, Guangchuang; bioc-devel
> Subject: Re: [Bioc-devel] annotation data not updated?
>
> Guangchuang,
>
> Thanks for reporting this. We've looked into it and there is indeed a more
> recent version of the data. Daniel is working on re-generating the db0 and
> OrgDb packages. We'll post back with more information when the packages are
> ready.
>
> Valerie
>
>
> On 11/02/2017 05:40 AM, Yu, Guangchuang wrote:
>
> Dear all,
>
> I just upgraded BioC to 3.6 and found that the data source of org.Hs.eg.db
> and GO.db is still half year ago.
>
> I was wondering whether these packages had been updated in current release.
>
>
>
> org.Hs.eg.db
>
>
> OrgDb object:
> | DBSCHEMAVERSION: 2.1
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
> | DBSCHEMA: HUMAN_DB
> | ORGANISM: Homo sapiens
> | SPECIES: Human
> | EGSOURCEDATE: *2017-Mar29*
> | EGSOURCENAME: Entrez Gene
> | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | CENTRALID: EG
> | TAXID: 9606
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> latest-lite/
> | GOSOURCEDATE: *2017-Mar29*
> | GOEGSOURCEDATE: 2017-Mar29
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | KEGGSOURCENAME: KEGG GENOME
> | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
> | KEGGSOURCEDATE: 2011-Mar15
> | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
> | GPSOURCEURL:
> | GPSOURCEDATE: 2017-Sep7
> | ENSOURCEDATE: 2017-Mar29
> | ENSOURCENAME: Ensembl
> | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
> | UPSOURCENAME: Uniprot
> | UPSOURCEURL: http://www.UniProt.org/
> | UPSOURCEDATE: Thu Oct 5 16:07:33 2017
>
> Please see: help('select') for usage information
>
>
> GO.db
>
>
> GODb object:
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/
> latest-lite/
> | GOSOURCEDATE: *2017-Mar29*
> | Db type: GODb
> | package: AnnotationDbi
> | DBSCHEMA: GO_DB
> | GOEGSOURCEDATE: 2017-Mar29
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | DBSCHEMAVERSION: 2.1
>
> Please see: help('select') for usage information
>
>
> sessionInfo()
>
>
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Sierra 10.12.6
>
> Matrix products: default
> BLAS: /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRblas.0.dylib
> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] org.Hs.eg.db_3.4.2 GO.db_3.4.2 AnnotationDbi_1.40.0
> [4] IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
> [7] BiocGenerics_0.24.0 rvcheck_0.0.9 rmarkdown_1.6
> [10] roxygen2_6.0.1 magrittr_1.5 BiocInstaller_1.28.0
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13 knitr_1.17 xml2_1.1.1 bit_1.1-12
> [5] R6_2.2.2 rlang_0.1.2 blob_1.1.0 stringr_1.2.0
> [9] tools_3.4.2 DBI_0.7 htmltools_0.3.6 commonmark_1.4
> [13] bit64_0.9-7 rprojroot_1.2 digest_0.6.12 tibble_1.3.4
> [17] memoise_1.1.0 RSQLite_2.0 evaluate_0.10.1 stringi_1.1.5
> [21] compiler_3.4.2 backports_1.1.1 pkgconfig_2.0.1
>
>
>
>
>
>
>
>
>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list