[Bioc-devel] restrictToSNV for VCF

Valerie Obenchain vobencha at fhcrc.org
Wed Mar 19 21:26:45 CET 2014


Thanks for the feedback.

I'll look into nchar for XStringSetList.

I'm in favor of supporting isDeletion(), isInsertion(), isIndel() and 
isSNV() for the VCF classes and removing restrictToSNV(). I could add an 
argument 'all_alt' or 'all_alt_agreement' to be used with CollapsedVCF 
in the case where not all alternate alleles meet the criteria.

Here are the current definitions:

> isDeletion <- function(x) {
>   nchar(alt(x)) == 1L & nchar(ref(x)) > 1L & substring(ref(x), 1, 1) == alt(x)
> }
>
> isInsertion <- function(x) {
>   nchar(ref(x)) == 1L & nchar(alt(x)) > 1L & substring(alt(x), 1, 1) == ref(x)
> }
>
> isIndel <- function(x) {
>   isDeletion(x) | isInsertion(x)
> }
>
> isSNV <- function(x) {
>   nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
> }


Valerie


On 03/19/2014 01:07 PM, Vincent Carey wrote:
>
>
>
> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:
>
>     It would be nice to have functions like isSNV, isIndel, isDeletion,
>     etc that at least provide precise definitions of the terminology.
>     I've added these, but they're designed only for VRanges. Should work
>     for ExpandedVCF.
>
>     Also, it would be nice if restrictToSNV just assumed that alt(x)
>     must be something with nchar() support (with special handling for
>     any List), so that the 'character' vector of alt,VRanges would work
>     immediately. Basically restrictToSNV should just be x[isSNV(x)]. Is
>     there even a use-case for the restrictToSNV abstraction if we did that?
>
>
> for VCF instance it would be x[isSNV(x),] and indeed I think that would
> be sufficient.  i like the idea of having this family of predicates for
> variant classes to allow such selections
>
>     Michael
>
>
>
>     On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>
>         Hi,
>
>         I've added a restrictToSNV() function to VariantAnnotation
>         (1.9.46). The return value is a subset VCF object containing
>         SNVs only. The function operates on CollapsedVCF or ExapandedVCF
>         and the alt(VCF) value must be nucleotides (i.e., no structural
>         variants).
>
>         A variant is considered a SNV if the nucleotide sequences in
>         both ref(vcf) and alt(x) are of length 1. I have a question
>         about how variants with multiple 'ALT' values should be handled.
>
>         Should we consider row 4 a SNV? One 'ALT' is length 1, the other
>         is not.
>
>         ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
>         REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>
>                 DataFrame(REF, ALT)
>
>             DataFrame with 4 rows and 2 columns
>                           REF                ALT
>                <DNAStringSet> <DNAStringSetList>
>             1              G                  A
>             2             AA                 TT
>             3              T                G,A
>             4              G               TT,C
>
>
>
>         Thanks.
>         Valerie
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>


-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list