[BioC] Complementing alleles in a VCF

Valerie Obenchain vobencha at fhcrc.org
Tue Apr 24 21:49:41 CEST 2012


Hi Alex,

Thanks for reporting the problem with predictCoding() and the handling 
of variants on different strands. You've probably seen this thread,

   https://stat.ethz.ch/pipermail/bioconductor/2012-April/045072.html

from Jeremiah reporting the same problem. In that exchange Herve 
summarized the fix we will implement and it is next on my list.

As for the other issues in this email,

- strand,VCF, and strand<-,VCF methods have been added

- DNAStringSetList() constructor has been added to Biostrings, see 
?DNAStringSetList

- The method for alt<-,VCF,DNAStringSet was removed leaving only 
alt<-,VCF,DNAStringSetList and
   alt<-,VCF,CharacterList. Documentation has been updated.

These changes are available in VariantAnnotation 1.3.8 and Biostrings 
2.25.3, both in devel. When I have the fix for predictCoding done I'll 
post to the thread Jeremiah started.

Valerie


On 04/19/2012 07:53 AM, Alex Gutteridge wrote:
> On 19.04.2012 14:34, Martin Morgan wrote:
>> Hi Alex --
>>
>> On 04/19/2012 02:39 AM, Alex Gutteridge wrote:
>>> Hi All,
>>>
>>> I'm having some difficulty with VCF handling and feeding 1000 genome
>>> VCFs into predictCoding from the VariantAnnotation package. Can anyone
>>> help?
>>>
>
> [...]
>
>>
>> as an example
>>
>>   fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
>>   vcf <- readVcf(fl, "hg19")
>>
>> you can set the strand on vcf with
>>
>>> strand(rowData(vcf))
>> 'factor' Rle of length 5 with 1 run
>>   Lengths: 5
>>   Values : *
>> Levels(3): + - *
>>> strand(rowData(vcf)) <- "+"
>>> strand(rowData(vcf))
>> 'factor' Rle of length 5 with 1 run
>>   Lengths: 5
>>   Values : +
>> Levels(3): + - *
>>
>> (probably there should be strand,VCF, and strand<-,VCF,ANY methods).
>>
>> For alt, it seems like DNAStringSetList needs a constructor
>>
>> DNAStringSetList <-
>>     function(...)
>> {
>>     listData <- list(...)
>>     if (length(listData) == 1 && is.list(listData[[1L]]))
>>         listData <- listData[[1L]]
>>     ok <- sapply(listData, is, "DNAStringSet")
>>     if (!all(ok))
>>         listData <- lapply(listData, as, "DNAStringSet")
>>     IRanges:::newCompressedList("DNAStringSetList", listData)
>> }
>>
>> and then
>>
>>> orig <- values(alt(vcf))[["ALT"]]
>>> dna <- complement(unlist(orig))
>>> alt(vcf) <- relist(dna, orig)
>>
>> (It's confusing, though convenient, how alt() returns a GRanges, but
>> alt<- expects 'value' to be a DNAStringSetList; also as you note the
>> docs say there should be a VCF,DNAStringSet method for alt<-, but
>> there doesn't seem to be one (it wouldn't be appropriate in your case,
>> because ALT is a DNAStringSetList).
>>
>> Martin
>
> Thanks Martin - very helpful. In the end I ran predictCoding after 
> ripping the Granges and varAllele information out of the vcf 
> separately. This works, but handling the VCF directly would seem more 
> elegant:
>
>> predictCoding(rowData(vcf), txdb, Hsapiens, 
>> complement(unlist(values(alt(vcf.filt))[["ALT"]])))
>



More information about the Bioconductor mailing list