[BioC] AllelicImbalance with VCFs?

Valerie Obenchain vobencha at fhcrc.org
Thu Jan 16 19:11:16 CET 2014


It looks like the bcf functions need substantial work. Development has 
been focused on vcf and bcf has been left in the dust. Unfortunately we 
don't have the resources to dedicate to this right now. If someone else 
wants to pick it up that would be great, if not, it has to stay on the 
back burner.

The current bcf code handles 6 geno fields (PL, DP, GQ, SP, GT and GL. 
(See Rsamtools::.bcf_template). AllelicImbalance must only need a subset 
of these fields (or none?).

If you are starting with a vcf you can selectively import one or more of 
these geno fields with a param.

param <- ScanVcfParam(geno=c("PL", "DP", "GT"))
readVcf(fl, "genome", param=param)

In this example it looks like the ASEset only needs the rowData (ranges) 
from the vcf:

>>> #Alternatively you can use readVcf on the original .vcf file and then
>>> create the ASEset-object like this:
>>> VCF <- readVcf("pathToFile")
>>> gr <-  rowData(VCF)
>>> countList <- getAlleleCount(reads, gr, verbose=FALSE)
>>> a <- ASEsetFromCountList(rowData = gr, countList)

To import just ranges with no info or geno use this param setup:

param <- ScanVcfParam(info=NA, geno=NA)
readVcf(fl, "genome", param=param)


Valerie


On 01/15/2014 08:55 AM, Valerie Obenchain wrote:
> Hi Tim, Jesper,
>
> No, scanVcf() wasn't intended for reading bcf. It sounds like this is a
> bug in scanBcf(). I'll look into it and get back to you.
>
> Thanks.
> Valerie
>
>
> On 01/14/2014 11:54 AM, Jesper Robert Gadin wrote:
>> Hey again,
>>
>> I experimented a little using scanVcf as import-function for ".bcf
>> files",
>> but unfortunately it did not work as well as I hoped. Actually not
>> sure if
>> scanVcf is meant to be used on .bcf files...
>>
>> #took out ex2.vcf that is the example file in the
>> VariantAnnotation-package. Processed it as follows:
>>> asBcf("ex2.vcf",c("20"),"~/Desktop/ex2",overwrite=TRUE)
>>> b <- scanBcf("ex2.bcf")
>> Error: scanBcf: failed to find fmt encoded as '18513'
>>    path: ex2.bcf
>> #which gives an error similar to yours
>> #then tried
>> s <- scanVcf("ex2.bcf")
>> #No error!
>> #but the output is to me not really meaningful
>>
>> #My guess is that your bcf file is corrupt in some way. Possibly because
>> you used the asBcf() function.
>> #If you instead use 'Bcftools' for this, the code below is something I
>> know
>> works (important to use same reference as was used when mapping):
>> samtools mpileup -ugSf ../References/Bowtie2Hg19/hg19.fa -r chr1
>>   AllSamples/*.bam | bcftools view -bvcg - > samples-chr1.bcf
>>
>> #Alternatively you can use readVcf on the original .vcf file and then
>> create the ASEset-object like this:
>> VCF <- readVcf("pathToFile")
>> gr <-  rowData(VCF)
>> countList <- getAlleleCount(reads, gr, verbose=FALSE)
>> a <- ASEsetFromCountList(rowData = gr, countList)
>>
>> /J.R.
>>
>>
>> On Tue, Jan 14, 2014 at 1:24 PM, Jesper Robert Gadin
>> <j.r.gadin at gmail.com>wrote:
>>
>>> Hi Tim,
>>>
>>> Glad to hear you find the package useful also for validation!
>>>
>>> 1) There is no implementation of scanVcf, because I didn't want the
>>> package to depend on too many other packages. Didn't have problems
>>> with the
>>> scanBcf myself, and therefore saw no reason to include scanVcf. That was
>>> the reason, but of course I can add an scanVcf option in the
>>> import-method!
>>>
>>> 2)  Did you mean that the default argument for searchArea could include
>>> all the ranges in the provided bcf/vcf-file? If so I agree; will include
>>> that asap.
>>>
>>> Expect the new features before next week.
>>>
>>> /J.R.
>>>
>>>
>>> On Mon, Jan 13, 2014 at 8:21 PM, Tim Triche, Jr.
>>> <tim.triche at usc.edu>wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm looking to verify some RNAseq results using (in some cases phased)
>>>> SNP calls from the CEU trio and from primary samples in-house (case and
>>>> matched control).  AllelicImbalance looks like a terrific library
>>>> for this,
>>>> however my BCFs can't be read by Rsamtools::scanBcf.  (I get an error
>>>> "Error: scanBcf: failed to find fmt encoded as '16977'  path:
>>>> /scratch/BAMs/LIU1675A8.hg19.bcf")
>>>>
>>>> VariantAnnotation::scanVcf, on the other hand, works fine.
>>>>
>>>> So, is there any particular reason why I couldn't hack up
>>>> the impBcfGL/impBcfGRL method to use VCFs?  Also, would it make
>>>> sense to
>>>> have a default argument for searchArea that falls back to the
>>>> assembly in
>>>> the metadata, if one is specified?
>>>>
>>>> The bizarre thing with the BCFs is that I can scan their headers just
>>>> fine... !
>>>>
>>>> Thanks,
>>>>
>>>> Timothy J. Triche, Jr., PhD
>>>> Jane Anne Nohl Division of Hematology
>>>> USC/Norris Comprehensive Cancer Center
>>>>
>>>
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>


-- 
Valerie Obenchain

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319



More information about the Bioconductor mailing list