[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation
Valerie Obenchain
vobencha at fhcrc.org
Mon Dec 9 21:07:34 CET 2013
Hi Thomas,
On 12/08/2013 09:08 AM, Thomas Girke wrote:
> Dear Michael and Valerie,
>
> VariantTools and VariantAnnotation are awesome packages. To the best of my
> knowledge, VariantTools is currently the only Bioc/R package that performs
> variant calling and it does this in a very nice way. With the available
> resources it is now straightforward to set up complete workflows for variant
> calling projects: (1) variant aware read alignments with GSNAP from gmapR ->
> (2) variant calling/filtering with VariantTools -> (3) adding genomic context
> with VariantAnnotation. This is really amazing!!!
>
> Here are a few questions related to both packages:
>
> (1) For teaching purposes and other obvious reasons it would be useful if a
> Windows version of VariantTools were available (and perhaps for gmapR too).
> Installing the package (includes gmapR) from source works fine on both Linux
> and OS X, but not on Windows.
>
> (2) The VRanges class is another great resource for filtering variant calls.
> What I was not able to locate though is a description/definition of the content
> of its different columns/components. Is something like this available
> somewhere?
>
> (3) When annotation variants with utilities from VariantAnnotation, it would
> useful to provide a convenience Summary Report function at the end of the
> workflow that exports the annotations to a file. A very common need here is to
> collapse the annotations for each variant on a single line so that one doesn't
> end up with annotation results of millions of lines as it is typical for many
> variant discovery projects. This also simplifies joins among different
> annotation instances because it maintains uniqueness among variant identifiers.
> This approach is often useful when comparing (joining) the variants among
> different genotypes (e.g. which variants are identical or unique among
> different mutants). An example solution is shown on slides 34-35 of this
> presentation:
> http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf
>
The variantReport() and codingReport() functions looks great. Would you
be willing to contribute them to VariantAnnotation?
> (4) predictCoding() reports the relative location where exactly a variant maps
> to an annotation range. It would be nice if locateVariants() could report the
> exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to
> position x of 5'UTR. Perhaps this is already possible but I couldn't figure
> out how to do it without reaching too far into my own hacking toolbox.
>
I could add a 'REFLOC' column to the otuput of locateVariants() that
would essentially be the "equivalent" to 'CDSLOC' from predictCoding().
Valerie
> Thanks for providing these excellent resources and most importantly your patience
> listing to these unsolicited questions.
>
> Best,
>
>
> Thomas
>
>
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2
> [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0
> [7] IRanges_1.20.6 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4
> [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0
> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0
> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3
> [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2
> [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6
> [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8
> [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0
> [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2
> [28] XML_3.95-0.2 zlibbioc_1.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Valerie Obenchain
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
More information about the Bioconductor
mailing list