[Bioc-devel] restrictToSNV for VCF

Valerie Obenchain vobencha at fhcrc.org
Wed Apr 9 21:20:18 CEST 2014


Update on these tasks.

1) XStringSetList now has an nchar() method (as of Biostrings 2.31.17)

2) restrictToSNV() was removed from VariantAnnotation

3) The following generics and methods for VCF and VRanges have been 
added to VariantAnnotation 1.9.50:

isSNV()
isInsertion()
isDeletion()
isIndel()
isSubstitution
isTranstion()

I've held off on adding

isSV()
isSVPrecise()

until we have a way to distinguish structural vs non-structual ALT. 
Currently if any of the ALT values are structural, all are coerced to 
character. It would be good to have a way to distinguish a mixture of 
ALT values so we can compute on the nucleotides and do whatever else on 
the structural variants. This may be a project for the next dev cycle.

Valerie


On 03/19/2014 03:29 PM, Michael Lawrence wrote:
> Thanks Sean. Probably also need an "isSubstitution" for any
> substitution, either SNV or complex.
>
>
> On Wed, Mar 19, 2014 at 3:20 PM, Sean Davis <sdavis2 at mail.nih.gov
> <mailto:sdavis2 at mail.nih.gov>> wrote:
>
>
>
>     On Wed, Mar 19, 2014 at 4:26 PM, Valerie Obenchain
>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>
>         Thanks for the feedback.
>
>         I'll look into nchar for XStringSetList.
>
>         I'm in favor of supporting isDeletion(), isInsertion(),
>         isIndel() and isSNV() for the VCF classes and removing
>         restrictToSNV(). I could add an argument 'all_alt' or
>         'all_alt_agreement' to be used with CollapsedVCF in the case
>         where not all alternate alleles meet the criteria.
>
>         Here are the current definitions:
>
>             isDeletion <- function(x) {
>                nchar(alt(x)) == 1L & nchar(ref(x)) > 1L &
>             substring(ref(x), 1, 1) == alt(x)
>             }
>
>             isInsertion <- function(x) {
>                nchar(ref(x)) == 1L & nchar(alt(x)) > 1L &
>             substring(alt(x), 1, 1) == ref(x)
>             }
>
>             isIndel <- function(x) {
>                isDeletion(x) | isInsertion(x)
>             }
>
>             isSNV <- function(x) {
>                nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
>             }
>
>
>
>     To be thorough:
>
>     isTransition()
>
>     isSV()
>
>     isSVPrecise()
>
>     We haven't been using VCF for SVs much yet, but there are probably
>     some fun things to be done on that front.
>
>     Sean
>
>
>
>         Valerie
>
>
>
>         On 03/19/2014 01:07 PM, Vincent Carey wrote:
>
>
>
>
>             On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
>             <lawrence.michael at gene.com
>             <mailto:lawrence.michael at gene.com>
>             <mailto:lawrence.michael at gene.__com
>             <mailto:lawrence.michael at gene.com>>> wrote:
>
>                  It would be nice to have functions like isSNV, isIndel,
>             isDeletion,
>                  etc that at least provide precise definitions of the
>             terminology.
>                  I've added these, but they're designed only for
>             VRanges. Should work
>                  for ExpandedVCF.
>
>                  Also, it would be nice if restrictToSNV just assumed
>             that alt(x)
>                  must be something with nchar() support (with special
>             handling for
>                  any List), so that the 'character' vector of
>             alt,VRanges would work
>                  immediately. Basically restrictToSNV should just be
>             x[isSNV(x)]. Is
>                  there even a use-case for the restrictToSNV abstraction
>             if we did that?
>
>
>             for VCF instance it would be x[isSNV(x),] and indeed I think
>             that would
>             be sufficient.  i like the idea of having this family of
>             predicates for
>             variant classes to allow such selections
>
>                  Michael
>
>
>
>                  On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
>                  <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>             <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>> wrote:
>
>                      Hi,
>
>                      I've added a restrictToSNV() function to
>             VariantAnnotation
>                      (1.9.46). The return value is a subset VCF object
>             containing
>                      SNVs only. The function operates on CollapsedVCF or
>             ExapandedVCF
>                      and the alt(VCF) value must be nucleotides (i.e.,
>             no structural
>                      variants).
>
>                      A variant is considered a SNV if the nucleotide
>             sequences in
>                      both ref(vcf) and alt(x) are of length 1. I have a
>             question
>                      about how variants with multiple 'ALT' values
>             should be handled.
>
>                      Should we consider row 4 a SNV? One 'ALT' is length
>             1, the other
>                      is not.
>
>                      ALT <- DNAStringSetList("A", c("TT"), c("G", "A"),
>             c("TT", "C"))
>                      REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>
>                              DataFrame(REF, ALT)
>
>                          DataFrame with 4 rows and 2 columns
>                                        REF                ALT
>                             <DNAStringSet> <DNAStringSetList>
>                          1              G                  A
>                          2             AA                 TT
>                          3              T                G,A
>                          4              G               TT,C
>
>
>
>                      Thanks.
>                      Valerie
>
>                      ___________________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             <mailto:Bioc-devel at r-project.__org
>             <mailto:Bioc-devel at r-project.org>>
>                      mailing list
>             https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>             <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                      <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>
>
>         --
>         Valerie Obenchain
>         Program in Computational Biology
>         Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N, M1-B155
>         P.O. Box 19024
>         Seattle, WA 98109-1024
>
>         E-mail: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>         Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>         Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>



More information about the Bioc-devel mailing list