[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation
Julian Gehring
julian.gehring at embl.de
Sun Dec 8 18:45:51 CET 2013
Hi Thomas,
> (1) For teaching purposes and other obvious reasons it would be useful if a
> Windows version of VariantTools were available (and perhaps for gmapR too).
> Installing the package (includes gmapR) from source works fine on both Linux
> and OS X, but not on Windows.
Due to many differences between the operating systems, building a
package like 'gmapR' (and every package that depends on it, like
'VariantTools') is often not possible for the windows OS. While Michael
or Thomas Wu may know more about the details, I would doubt that these
packages will be available for windows soon. As an alternative, the
amazon bioconductor instances may be useful for you in this context.
> (3) When annotation variants with utilities from VariantAnnotation, it would
> useful to provide a convenience Summary Report function at the end of the
> workflow that exports the annotations to a file. A very common need here is to
> collapse the annotations for each variant on a single line so that one doesn't
> end up with annotation results of millions of lines as it is typical for many
> variant discovery projects. This also simplifies joins among different
> annotation instances because it maintains uniqueness among variant identifiers.
> This approach is often useful when comparing (joining) the variants among
> different genotypes (e.g. which variants are identical or unique among
> different mutants). An example solution is shown on slides 34-35 of this
> presentation:
> http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf
The fact that one variant may have multiple consequences makes it often
harder to report or post-process the results, than it would be with a
simple 1:1 mapping. Other softwares have the concept of reporting the
'most severe' consequence (as annovar), but the definition for this is
not well defined and may result in missing interesting consequences.
Merging the consequences of a variant into a single line, as you have
shown in your slides, may make it difficult to disentangle the
relationship between the consequences. As an example, taking the last
line from your presentation p. 35:
ID: Chr5:6455_T/C
Location: promoter coding
Gene: AT5G01010 AT5G01015 AT5G01020
Here, it is not possible anymore to relate the location of the variant
to the affected gene. Out of interest, how are you dealing with this in
your reports?
Best wishes
Julian
More information about the Bioconductor
mailing list