[Bioc-devel] restrictToSNV for VCF
Valerie Obenchain
vobencha at fhcrc.org
Wed Mar 19 21:26:45 CET 2014
Thanks for the feedback.
I'll look into nchar for XStringSetList.
I'm in favor of supporting isDeletion(), isInsertion(), isIndel() and
isSNV() for the VCF classes and removing restrictToSNV(). I could add an
argument 'all_alt' or 'all_alt_agreement' to be used with CollapsedVCF
in the case where not all alternate alleles meet the criteria.
Here are the current definitions:
> isDeletion <- function(x) {
> nchar(alt(x)) == 1L & nchar(ref(x)) > 1L & substring(ref(x), 1, 1) == alt(x)
> }
>
> isInsertion <- function(x) {
> nchar(ref(x)) == 1L & nchar(alt(x)) > 1L & substring(alt(x), 1, 1) == ref(x)
> }
>
> isIndel <- function(x) {
> isDeletion(x) | isInsertion(x)
> }
>
> isSNV <- function(x) {
> nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
> }
Valerie
On 03/19/2014 01:07 PM, Vincent Carey wrote:
>
>
>
> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:
>
> It would be nice to have functions like isSNV, isIndel, isDeletion,
> etc that at least provide precise definitions of the terminology.
> I've added these, but they're designed only for VRanges. Should work
> for ExpandedVCF.
>
> Also, it would be nice if restrictToSNV just assumed that alt(x)
> must be something with nchar() support (with special handling for
> any List), so that the 'character' vector of alt,VRanges would work
> immediately. Basically restrictToSNV should just be x[isSNV(x)]. Is
> there even a use-case for the restrictToSNV abstraction if we did that?
>
>
> for VCF instance it would be x[isSNV(x),] and indeed I think that would
> be sufficient. i like the idea of having this family of predicates for
> variant classes to allow such selections
>
> Michael
>
>
>
> On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
> <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi,
>
> I've added a restrictToSNV() function to VariantAnnotation
> (1.9.46). The return value is a subset VCF object containing
> SNVs only. The function operates on CollapsedVCF or ExapandedVCF
> and the alt(VCF) value must be nucleotides (i.e., no structural
> variants).
>
> A variant is considered a SNV if the nucleotide sequences in
> both ref(vcf) and alt(x) are of length 1. I have a question
> about how variants with multiple 'ALT' values should be handled.
>
> Should we consider row 4 a SNV? One 'ALT' is length 1, the other
> is not.
>
> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
> REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>
> DataFrame(REF, ALT)
>
> DataFrame with 4 rows and 2 columns
> REF ALT
> <DNAStringSet> <DNAStringSetList>
> 1 G A
> 2 AA TT
> 3 T G,A
> 4 G TT,C
>
>
>
> Thanks.
> Valerie
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list