[BioC] rsamtools scanBcf segfault
Valerie Obenchain
vobencha at fhcrc.org
Thu Sep 15 01:40:35 CEST 2011
Hi Rob,
I wanted to point out some new functionality in Rsamtools (in devel) for
reading vcf files,
scanVcfHeader
scanVcf
unpackVcf
and one function in the VariantAnnotation package (also in devel),
readVcf
scanVcf returns a list of the 9 elements in a VCF file and unpackVcf
further parses the INFO and GENO fields. readVcf() in VariantAnnotation
puts the data into a SummarizedExperiment object.
Feedback on these functions is welcome.
Valerie
On 07/28/11 06:36, Martin Morgan wrote:
> On 07/28/2011 12:08 AM, Rob Syme wrote:
>>> Thanks Rob for the nice reproducible example. The problem is when
>>> you open
>>> the BcfFile; 'mode' should be 'rb' to indicate that you want to read a
>>> binary file. Or leave it unspecified and the heuristic will do the
>>> right
>>> thing.
>
> Thanks that'll be corrected.
>
>>
>> Thanks Martin, the manual at
>> http://www.bioconductor.org/packages/2.8/bioc/manuals/Rsamtools/man/Rsamtools.pdf
>>
>> includes a minor error that threw me off. The "Usage" section is
>> correct (and I should have read this more closely), but in the
>> "Arguments" section, the mode argument is described as:
>> "A character(1) vector; mode="rw" indicates a binary (BCF) file,
>> mode="r" a text (VCF) file"
>> when it should probably be
>> "A character(1) vector; mode="rb" indicates a binary (BCF) file,
>> mode="r" a text (VCF) file"
>>
>>
>> I have another error - and I've had a closer read of the manual this
>> time ;)
>> I have generated a filtered vcf using freebayes, but the following
>> commands throw an error:
>>
>> wget
>> http://gist.github.com/raw/1111101/f8629dfa20b15893684c78a586f321dbc65c6b6c/subset.vcf
>> grep -v "^#" subset.vcf | cut -f 1 | sort -u> subset.dict
>> bcftools view -bSD subset.dict subset.vcf> subset.bcf
>> bcftools index subset.bcf
>> wget
>> http://gist.github.com/raw/1111101/5684d61541b39fb8c02e765d5f7891d9d122244e/scanBcf_error.R
>> Rscript scanBcf_error.R
>>
>> # This gives the error:
>> Error: scanBcf: failed to find fmt encoded as '21057'
>> path:<<redacted>>/subset.bcf
>> Execution halted
>
> The problem is that the current implementation isn't really general,
> it implements the formats supported (computed on, rather than just
> echoed) by bcftools -- PL, DP, GQ, SP, GT, GL -- whereas vcf is really
> much more flexible than that. This is cryptically documented on
> ?scanBcf, but I'd like to make this more flexible and will work on this.
>
>> # Calling scanBcfHeader also gives an error:
>> scanBcfHeader(bcf)
>> Error in mapply(f, ..., SIMPLIFY = FALSE) :
>> 'names' attribute [5] must be the same length as the vector [4]
>
> This has been addressed previously in the devel branch.
>
> Thanks Rob for the reports.
>
> Martin
>
>>
>>
>> I can't see anything wrong with the VCF, am I missing something
>> obvious again?
>
>
More information about the Bioconductor
mailing list