[Bioc-devel] BiomartGeneRegionTrack question
Hahne, Florian
florian.hahne at novartis.com
Wed Jul 27 13:49:02 CEST 2016
Valerie,
I took a somewhat closer look at this, and I think that a mapping between Ensembl genome version and UCSC genome identifiers is all that is needed from the Bioconductor side. I can figure out a way to identify the relevant Ensembl archive to load during Gviz package build.
Biomart provides the version information of all data sets via the listDatasets() function in the form of a data.frame:
head(ds)
dataset description
1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5)
2 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3)
3 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1)
4 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2)
5 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3)
6 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1)
version
1 OANA5
2 cavPor3
3 BROADS1
4 spetri2
5 loxAfr3
6 choHof1
As you can see, the species is somewhat stored in the dataset column, but not in a standard term. The genome or assembly version is stored in the version column. With that information and the table provided here (https://genome.ucsc.edu/FAQ/FAQreleases.html) it should be fairly straight forward to set up a manual mapping. If you do not want to go through all the old Biomart archives you can get a complete listing from this table on the ENSEMBL web site: http://www.ensembl.org/info/website/archives/assembly.html
I have already done this exercise with the tables in the Gviz package, and could provide a current version with the relevant information. Mappings from UCSC genome to ENSEMBL versions do not have to be unique since the latter are typically down to the minor release, whereas UCSC only lists the major release. My understanding is that all minor releases are guaranteed to share the same chromosome coordinates, and only represent local patches, but you guys surely know more about all of this.
Just let me know about the best way forward.
Florian
On 26/07/16 21:27, "Obenchain, Valerie" <Valerie.Obenchain at RoswellPark.org> wrote:
>Hi Florian,
>
>On 07/21/2016 01:47 AM, Hahne, Florian wrote:
>> This is a problem with the biomaRt package and its connection to the Ensembl archives, not Gviz. Here’s the call the fails:
>> listMarts(host="feb2012.archive.ensembl.org", path="/biomart/martservice")
>>
>> It looks like Ensembl is no longer providing a download for the feb2012 archive. You could try the May2012 one, which according to this table (http://www.ensembl.org/info/website/archives/assembly.html) should still provide the mm9 (NCBIm37) genome:
>>
>> bm <- useMart(host = "may2012.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl")
>> bmTrack <- BiomartGeneRegionTrack(start=26682683, end=26711643, chromosome=7, genome="mm9", biomart = bm)
>>
>> I’ll update the automated mapping from UCSC genome identifier to Biomart within Gviz, however I am more and more convinced that this whole setup is not ideal. I simply do not have the time to keep track of all the Ensembl changes and new genome versions. There really should be an annotation package or the like maintained by Bioconductor core or within the biomaRt package that gives a mapping from a UCSC genome identifier to an Ensembl genome version and the Ensembl archive to access that.
>
>The mapping between genome identifiers seems like a natural fit for the
>GenomeInfoDb package. Mapping to a particular ensembl archive might be
>more appropriate to have in biomaRt but I'm open to what people think.
>
>If you're in favor of this re-org we can start incorporating the
>following into GenomeInfoDb:
>
>- Gviz:::.getBiobmart() # or move to biomaRt
>- Gviz:::.ucsc2Ensembl()
>- extdata/biomartVersionsNow.txt
>- extdata/biomartVersionsLatest.txt
>
>Any hints or tips for maintaining the .txt files, i.e., what worked,
>what you might do different the second time around?
>
>Valerie
>
>
>
>>
>> Florian
>>
>>
>> On 20/07/16 21:15, "Bioc-devel on behalf of James W. MacDonald" <bioc-devel-bounces at r-project.org on behalf of jmacdon at uw.edu> wrote:
>>
>>> Hi Holly,
>>>
>>> This list is intended for those that are developing packages. Your question
>>> should be asked on the support site (https://support.bioconductor.org).
>>> Please repost over there.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>> On Wed, Jul 20, 2016 at 2:04 PM, Holly <xyang2 at uchicago.edu> wrote:
>>>
>>>> Dear Bioconductor helpers,
>>>> I am trying to plot a region of interest using the Gviz package.
>>>> I met error when running the following example code:
>>>>> library(Gviz)
>>>>> library(GenomicRanges)
>>>>> bmTrack <- BiomartGeneRegionTrack(start=26682683, end=26711643,
>>>> + chromosome=7, genome="mm9")
>>>> Entity 'nbsp' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'nbsp' not defined
>>>> Entity 'raquo' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'hellip' not defined
>>>> Entity 'hellip' not defined
>>>> Opening and ending tag mismatch: img line 68 and li
>>>> Opening and ending tag mismatch: li line 68 and ul
>>>> Opening and ending tag mismatch: ul line 67 and div
>>>> Entity 'copy' not defined
>>>> Opening and ending tag mismatch: div line 19 and body
>>>> Opening and ending tag mismatch: body line 17 and html
>>>> Premature end of data in tag html line 2
>>>> Error: 1: Entity 'nbsp' not defined
>>>> 2: Entity 'hellip' not defined
>>>> 3: Entity 'hellip' not defined
>>>> 4: Entity 'nbsp' not defined
>>>> 5: Entity 'raquo' not defined
>>>> 6: Entity 'hellip' not defined
>>>> 7: Entity 'hellip' not defined
>>>> 8: Entity 'hellip' not defined
>>>> 9: Entity 'hellip' not defined
>>>> 10: Entity 'hellip' not defined
>>>> 11: Opening and ending tag mismatch: img line 68 and li
>>>> 12: Opening and ending tag mismatch: li line 68 and ul
>>>> 13: Opening and ending tag mismatch: ul line 67 and div
>>>> 14: Entity 'copy' not defined
>>>> 15: Opening and ending tag mismatch: div line 19 and body
>>>> 16: Opening and ending tag mismatch: body line 17 and html
>>>> 17: Premature end of data in tag html line 2
>>>>> sessionInfo()
>>>> R version 3.3.1 (2016-06-21)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] grid parallel stats4 stats graphics grDevices utils
>>>> [8] datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] Gviz_1.16.1 GenomicRanges_1.24.2 GenomeInfoDb_1.8.2
>>>> [4] IRanges_2.6.1 S4Vectors_0.10.2 BiocGenerics_0.18.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] SummarizedExperiment_1.2.3 VariantAnnotation_1.18.3
>>>> [3] splines_3.3.1 lattice_0.20-33
>>>> [5] colorspace_1.2-6 htmltools_0.3.5
>>>> [7] rtracklayer_1.32.1 GenomicFeatures_1.24.4
>>>> [9] chron_2.3-47 interactiveDisplayBase_1.10.3
>>>> [11] survival_2.39-5 XML_3.98-1.4
>>>> [13] foreign_0.8-66 DBI_0.4-1
>>>> [15] ensembldb_1.4.7 BiocParallel_1.6.2
>>>> [17] RColorBrewer_1.1-2 matrixStats_0.50.2
>>>> [19] plyr_1.8.4 zlibbioc_1.18.0
>>>> [21] Biostrings_2.40.2 munsell_0.4.3
>>>> [23] gtable_0.2.0 latticeExtra_0.6-28
>>>> [25] Biobase_2.32.0 biomaRt_2.28.0
>>>> [27] BiocInstaller_1.22.3 httpuv_1.3.3
>>>> [29] AnnotationDbi_1.34.4 Rcpp_0.12.5
>>>> [31] acepack_1.3-3.3 xtable_1.8-2
>>>> [33] BSgenome_1.40.1 scales_0.4.0
>>>> [35] Hmisc_3.17-4 XVector_0.12.0
>>>> [37] mime_0.5 Rsamtools_1.24.0
>>>> [39] gridExtra_2.2.1 AnnotationHub_2.4.2
>>>> [41] ggplot2_2.1.0 digest_0.6.9
>>>> [43] biovizBase_1.20.0 shiny_0.13.2
>>>> [45] tools_3.3.1 bitops_1.0-6
>>>> [47] RCurl_1.95-4.8 RSQLite_1.0.0
>>>> [49] dichromat_2.0-0 Formula_1.2-1
>>>> [51] cluster_2.0.4 Matrix_1.2-6
>>>> [53] data.table_1.9.6 httr_1.2.1
>>>> [55] R6_2.1.2 rpart_4.1-10
>>>> [57] GenomicAlignments_1.8.4 nnet_7.3-12
>>>> Thank you for help,
>>>> Holly
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
>This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
More information about the Bioc-devel
mailing list