[Bioc-devel] restrictToSNV for VCF
Stephanie M. Gogarten
sdmorris at u.washington.edu
Wed Mar 19 19:42:31 CET 2014
I would say rows 1 and 3 are SNVs, but not row 4. For this application
I think a variant has to be an SNV or not, as you can't pass half a
variant. (I suppose you could remove the ALT values with length > 1 and
set those genotypes to missing, but that is both complicated and
unexpected behavior. Also, it could introduce bias into association
testing, since you would get non-random missingness.)
isSNV() in SeqVarTools returns TRUE only if the max length of all
alleles is 1. It also has a logical argument "biallelic" which allows
to select for only biallelic SNVs - that could be useful here as well.
If biallelic=TRUE, only row 1 would make it into the subset.
Stephanie
On 3/18/14 4:04 PM, Julian Gehring wrote:
> Hi Valerie,
>
> I would consider G>C an SNV, G>TT not. But I assume that there exists
> no clear consensus on this. How about a flag that let's the second pass
> as SNV optionally, so everybody can get what one needs?
>
> Best wishes
> Julian
>
>
> On 18/03/14 18:36, Valerie Obenchain wrote:
>> Hi,
>>
>> I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The
>> return value is a subset VCF object containing SNVs only. The function
>> operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be
>> nucleotides (i.e., no structural variants).
>>
>> A variant is considered a SNV if the nucleotide sequences in both
>> ref(vcf) and alt(x) are of length 1. I have a question about how
>> variants with multiple 'ALT' values should be handled.
>>
>> Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not.
>>
>> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
>> REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>>>> DataFrame(REF, ALT)
>>> DataFrame with 4 rows and 2 columns
>>> REF ALT
>>> <DNAStringSet> <DNAStringSetList>
>>> 1 G A
>>> 2 AA TT
>>> 3 T G,A
>>> 4 G TT,C
>>
>>
>> Thanks.
>> Valerie
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
More information about the Bioc-devel
mailing list