[BioC] Complementing alleles in a VCF
Alex Gutteridge
alexg at ruggedtextile.com
Thu Apr 19 16:53:06 CEST 2012
On 19.04.2012 14:34, Martin Morgan wrote:
> Hi Alex --
>
> On 04/19/2012 02:39 AM, Alex Gutteridge wrote:
>> Hi All,
>>
>> I'm having some difficulty with VCF handling and feeding 1000 genome
>> VCFs into predictCoding from the VariantAnnotation package. Can
>> anyone
>> help?
>>
[...]
>
> as an example
>
> fl <- system.file("extdata", "ex2.vcf",
> package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
>
> you can set the strand on vcf with
>
>> strand(rowData(vcf))
> 'factor' Rle of length 5 with 1 run
> Lengths: 5
> Values : *
> Levels(3): + - *
>> strand(rowData(vcf)) <- "+"
>> strand(rowData(vcf))
> 'factor' Rle of length 5 with 1 run
> Lengths: 5
> Values : +
> Levels(3): + - *
>
> (probably there should be strand,VCF, and strand<-,VCF,ANY methods).
>
> For alt, it seems like DNAStringSetList needs a constructor
>
> DNAStringSetList <-
> function(...)
> {
> listData <- list(...)
> if (length(listData) == 1 && is.list(listData[[1L]]))
> listData <- listData[[1L]]
> ok <- sapply(listData, is, "DNAStringSet")
> if (!all(ok))
> listData <- lapply(listData, as, "DNAStringSet")
> IRanges:::newCompressedList("DNAStringSetList", listData)
> }
>
> and then
>
>> orig <- values(alt(vcf))[["ALT"]]
>> dna <- complement(unlist(orig))
>> alt(vcf) <- relist(dna, orig)
>
> (It's confusing, though convenient, how alt() returns a GRanges, but
> alt<- expects 'value' to be a DNAStringSetList; also as you note the
> docs say there should be a VCF,DNAStringSet method for alt<-, but
> there doesn't seem to be one (it wouldn't be appropriate in your
> case,
> because ALT is a DNAStringSetList).
>
> Martin
Thanks Martin - very helpful. In the end I ran predictCoding after
ripping the Granges and varAllele information out of the vcf separately.
This works, but handling the VCF directly would seem more elegant:
> predictCoding(rowData(vcf), txdb, Hsapiens,
> complement(unlist(values(alt(vcf.filt))[["ALT"]])))
--
Alex Gutteridge
More information about the Bioconductor
mailing list