[Bioc-devel] restrictToSNV for VCF
Hervé Pagès
hpages at fhcrc.org
Fri Mar 21 01:23:27 CET 2014
On 03/20/2014 05:20 PM, Hervé Pagès wrote:
[...]
>
> Following that logic names(se1) also probably return colnames(se1).
/\
should
H.
>
> H.
>
>>
>>
>>
>>
>> On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
>> <stvjc at channing.harvard.edu>wrote:
>>
>>>
>>>
>>>
>>> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence <
>>> lawrence.michael at gene.com> wrote:
>>>
>>>> It would be nice to have functions like isSNV, isIndel, isDeletion, etc
>>>> that at least provide precise definitions of the terminology. I've
>>>> added
>>>> these, but they're designed only for VRanges. Should work for
>>>> ExpandedVCF.
>>>>
>>>> Also, it would be nice if restrictToSNV just assumed that alt(x)
>>>> must be
>>>> something with nchar() support (with special handling for any List), so
>>>> that the 'character' vector of alt,VRanges would work immediately.
>>>> Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
>>>> use-case for the restrictToSNV abstraction if we did that?
>>>>
>>>>
>>> for VCF instance it would be x[isSNV(x),] and indeed I think that
>>> would be
>>> sufficient. i like the idea of having this family of predicates for
>>> variant classes to allow such selections
>>>
>>>
>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
>>>> <vobencha at fhcrc.org>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've added a restrictToSNV() function to VariantAnnotation
>>>>> (1.9.46). The
>>>>> return value is a subset VCF object containing SNVs only. The function
>>>>> operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value
>>>>> must be
>>>>> nucleotides (i.e., no structural variants).
>>>>>
>>>>> A variant is considered a SNV if the nucleotide sequences in both
>>>>> ref(vcf) and alt(x) are of length 1. I have a question about how
>>>>> variants
>>>>> with multiple 'ALT' values should be handled.
>>>>>
>>>>> Should we consider row 4 a SNV? One 'ALT' is length 1, the other is
>>>>> not.
>>>>>
>>>>> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
>>>>> REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>>>>>
>>>>>> DataFrame(REF, ALT)
>>>>>>>
>>>>>> DataFrame with 4 rows and 2 columns
>>>>>> REF ALT
>>>>>> <DNAStringSet> <DNAStringSetList>
>>>>>> 1 G A
>>>>>> 2 AA TT
>>>>>> 3 T G,A
>>>>>> 4 G TT,C
>>>>>>
>>>>>
>>>>>
>>>>> Thanks.
>>>>> Valerie
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list