[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation
Thomas Girke
thomas.girke at ucr.edu
Sun Dec 8 18:08:29 CET 2013
Dear Michael and Valerie,
VariantTools and VariantAnnotation are awesome packages. To the best of my
knowledge, VariantTools is currently the only Bioc/R package that performs
variant calling and it does this in a very nice way. With the available
resources it is now straightforward to set up complete workflows for variant
calling projects: (1) variant aware read alignments with GSNAP from gmapR ->
(2) variant calling/filtering with VariantTools -> (3) adding genomic context
with VariantAnnotation. This is really amazing!!!
Here are a few questions related to both packages:
(1) For teaching purposes and other obvious reasons it would be useful if a
Windows version of VariantTools were available (and perhaps for gmapR too).
Installing the package (includes gmapR) from source works fine on both Linux
and OS X, but not on Windows.
(2) The VRanges class is another great resource for filtering variant calls.
What I was not able to locate though is a description/definition of the content
of its different columns/components. Is something like this available
somewhere?
(3) When annotation variants with utilities from VariantAnnotation, it would
useful to provide a convenience Summary Report function at the end of the
workflow that exports the annotations to a file. A very common need here is to
collapse the annotations for each variant on a single line so that one doesn't
end up with annotation results of millions of lines as it is typical for many
variant discovery projects. This also simplifies joins among different
annotation instances because it maintains uniqueness among variant identifiers.
This approach is often useful when comparing (joining) the variants among
different genotypes (e.g. which variants are identical or unique among
different mutants). An example solution is shown on slides 34-35 of this
presentation:
http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf
(4) predictCoding() reports the relative location where exactly a variant maps
to an annotation range. It would be nice if locateVariants() could report the
exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to
position x of 5'UTR. Perhaps this is already possible but I couldn't figure
out how to do it without reaching too far into my own hacking toolbox.
Thanks for providing these excellent resources and most importantly your patience
listing to these unsolicited questions.
Best,
Thomas
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2
[4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0
[7] IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4
[4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0
[7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0
[10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3
[13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2
[16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6
[19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8
[22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0
[25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2
[28] XML_3.95-0.2 zlibbioc_1.8.0
More information about the Bioconductor
mailing list