[Bioc-devel] restrictToSNV for VCF

Hervé Pagès hpages at fhcrc.org
Fri Mar 21 01:20:51 CET 2014


Hi,

On 03/19/2014 01:10 PM, Michael Lawrence wrote:
> You can apparently use 1D extraction for VCF, which is a little surprising;
> I learned it from restrictToSNV.

This is inherited from SummarizedExperiment:

   > example(SummarizedExperiment)

   > se1
   class: SummarizedExperiment
   dim: 200 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

   > se1[1:4]
   class: SummarizedExperiment
   dim: 4 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

To me that means that a SummarizedExperiment has a length
(conceptually), and that this length is the number of rows.
It would actually help if a "length" method was defined:

   > length(se1)
   [1] 1

That would automatically fix many convenience [ wrappers like head(),
tail(), rev(), etc...

   > head(se1)
   class: SummarizedExperiment
   dim: 1 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

   > rev(se1)
   class: SummarizedExperiment
   dim: 1 6
   exptData(0):
   assays(1): counts
   rownames: NULL
   rowData metadata column names(0):
   colnames(6): A B ... E F
   colData names(1): Treatment

Following that logic names(se1) also probably return colnames(se1).

H.

>
>
>
>
> On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
> <stvjc at channing.harvard.edu>wrote:
>
>>
>>
>>
>> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence <
>> lawrence.michael at gene.com> wrote:
>>
>>> It would be nice to have functions like isSNV, isIndel, isDeletion, etc
>>> that at least provide precise definitions of the terminology. I've added
>>> these, but they're designed only for VRanges. Should work for ExpandedVCF.
>>>
>>> Also, it would be nice if restrictToSNV just assumed that alt(x) must be
>>> something with nchar() support (with special handling for any List), so
>>> that the 'character' vector of alt,VRanges would work immediately.
>>> Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
>>> use-case for the restrictToSNV abstraction if we did that?
>>>
>>>
>> for VCF instance it would be x[isSNV(x),] and indeed I think that would be
>> sufficient.  i like the idea of having this family of predicates for
>> variant classes to allow such selections
>>
>>
>>
>>> Michael
>>>
>>>
>>>
>>> On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain <vobencha at fhcrc.org>wrote:
>>>
>>>> Hi,
>>>>
>>>> I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The
>>>> return value is a subset VCF object containing SNVs only. The function
>>>> operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be
>>>> nucleotides (i.e., no structural variants).
>>>>
>>>> A variant is considered a SNV if the nucleotide sequences in both
>>>> ref(vcf) and alt(x) are of length 1. I have a question about how variants
>>>> with multiple 'ALT' values should be handled.
>>>>
>>>> Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not.
>>>>
>>>> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
>>>> REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>>>>
>>>>> DataFrame(REF, ALT)
>>>>>>
>>>>> DataFrame with 4 rows and 2 columns
>>>>>               REF                ALT
>>>>>    <DNAStringSet> <DNAStringSetList>
>>>>> 1              G                  A
>>>>> 2             AA                 TT
>>>>> 3              T                G,A
>>>>> 4              G               TT,C
>>>>>
>>>>
>>>>
>>>> Thanks.
>>>> Valerie
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list