[Bioc-devel] making txdb, and propagating metadata from AnnotationHub to GenomicFeatures

Michael Love michaelisaiahlove at gmail.com
Wed Jun 17 09:35:51 CEST 2015


Background:

With previous approaches that I would recommend to users for building
txdb along the way of making count tables, it was desirable that the
GTF release information would *automatically* be passed into the
metadata of the rowRanges of the SummarizedExperiment.

for example, in parathyroidSE, using makeTxDbFromBiomart:

...
BioMart database version : chr "ENSEMBL GENES 72 (SANGER UK)"
BioMart dataset : chr "hsapiens_gene_ensembl"
BioMart dataset description : chr "Homo sapiens genes (GRCh37.p11)"

or, alternatively, using makeTxDbFromGFF, it would be present in the
name of the GTF file:

...
Data source: chr "Homo_sapiens.GRCh37.75.gtf"


Question:

I'm now interested in switching over to using GTF files available
through AnnotationHub, and wondering how we can maintain this
automatic propagation of metadata.

here's some example code:

library(GenomicFeatures)
library(AnnotationHub)
ah <- AnnotationHub()
z <- query(ah, c("Ensembl","gtf","Caenorhabditis elegans","release-80"))
stopifnot(length(z) == 1)
z$title

 [1] "Caenorhabditis_elegans.WBcel235.80.gtf"

gtf <- ah[[names(z)]]
metadata(gtf)

 $AnnotationHubName
 [1] "AH47045"

txdb <- makeTxDbFromGRanges(gtf)
metadata(txdb)

                                        name
 1                                   Db type
 2                        Supporting package
 3                                    Genome
 4                           transcript_nrow
 5                                 exon_nrow
 6                                  cds_nrow
 7                             Db created by
 8                             Creation time
 9  GenomicFeatures version at creation time
 10         RSQLite version at creation time
 11                          DBSCHEMAVERSION
                                           value
 1                                          TxDb
 2                               GenomicFeatures
 3                                      WBcel235
 4                                         57834
 5                                        173506
 6                                        131562
 7     GenomicFeatures package from Bioconductor
 8  2015-06-17 09:31:10 +0200 (Wed, 17 Jun 2015)
 9                                       1.21.13
 10                                        1.0.0
 11                                          1.1


sessionInfo()

 R Under development (unstable) (2015-04-29 r68278)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)
 Running under: OS X 10.10.3 (Yosemite)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4    parallel  stats     graphics  grDevices datasets  utils
 [8] methods   base

 other attached packages:
  [1] AnnotationHub_2.1.26    GenomicFeatures_1.21.13 AnnotationDbi_1.31.16
  [4] Biobase_2.29.1          GenomicRanges_1.21.15   GenomeInfoDb_1.5.7
  [7] IRanges_2.3.11          S4Vectors_0.7.5         BiocGenerics_0.15.2
 [10] testthat_0.10.0         knitr_1.10              BiocInstaller_1.19.6

 loaded via a namespace (and not attached):
  [1] Rcpp_0.11.6                  compiler_3.3.0
  [3] futile.logger_1.4.1          XVector_0.9.1
  [5] bitops_1.0-6                 futile.options_1.0.0
  [7] tools_3.3.0                  zlibbioc_1.15.0
  [9] biomaRt_2.25.1               digest_0.6.8
 [11] RSQLite_1.0.0                memoise_0.2.1
 [13] shiny_0.12.1                 DBI_0.3.1
 [15] rtracklayer_1.29.10          httr_0.6.1
 [17] stringr_1.0.0                Biostrings_2.37.2
 [19] R6_2.0.1                     XML_3.98-1.2
 [21] BiocParallel_1.3.26          lambda.r_1.1.7
 [23] magrittr_1.5                 Rsamtools_1.21.8
 [25] htmltools_0.2.6              GenomicAlignments_1.5.9
 [27] SummarizedExperiment_0.1.5   xtable_1.7-4
 [29] mime_0.3                     interactiveDisplayBase_1.7.0
 [31] httpuv_1.3.2                 stringi_0.4-1
 [33] RCurl_1.95-4.6               crayon_1.3.0



More information about the Bioc-devel mailing list