[Bioc-devel] [VariantAnnotation] segfault thrown when scan header for VCF files out of GATK pipeline

Martin Morgan mtmorgan at fhcrc.org
Tue Feb 19 20:58:10 CET 2013


On 02/19/2013 11:44 AM, Tengfei Yin wrote:
> Hi ,
>
> I am working on the vcf files from GATK pipeline(usually around
> 400Mb/file),  but I encountered some problems importing vcf files in R
> using VariantAnnotation package, this has been confirmed in both released
> and devel version of VariantAnnotation, I only provide sessionInfo for
> devel-branch. I may not have the permission to provide the data here, I
> just post here first to see if there is an obvious answer I don't know yet,
> if any data and reproducible example are needed, I could work on that.
>
> I don't know it's an issue in R, samtools or GATK... and I have no problem
> importing vcf files from just bcftools pipeline.
>
> If you need any details like command pipeline and version of other
> software, please let me know. Thanks a lot.

Probably a problem in Rsamtools C code, depending just on the header of the VCF 
file, maybe triggered by garbage collection. You could debug further yourself by 
setting

~/.R/Makevars:
CFLAGS="-g -O0"

and then installing Rsamtools from source

   biocLite("Rsamtools", type="source")

and finally running a minimal test script with either

   R -d valgrind -f test.R

or under the gdb

   R -d gdb -f test.R
   (gdb) run
   segfault occurs, then
   (gdb) bt

to get a back trace. Feel free to share the output of either with me off-list, 
or if possible to share just the header data from the vcf.

Martin



>
> Tengfei
>
>> scanVcfHeader("~/GATK-64/GATK_AUTOMATION/VCF/Adams.vcf")
> Adams.vcf
>> hdr = scanVcfHeader("~//GATK-64/GATK_AUTOMATION/VCF/Adams.vcf")
>
>   *** caught segfault ***
> address (nil), cause 'memory not mapped'
>
> Traceback:
>   1: .Call(.scan_bcf_header, .extptr(file))
>   2: scanBcfHeader(bf)
>   3: scanBcfHeader(bf)
>   4: (function (file, mode) {    bf <- open(BcfFile(file, character(0),
> ...))    on.exit(close(bf))    scanBcfHeader(bf)})(dots[[1L]][[1L]])
>   5: mapply(FUN = f, ..., SIMPLIFY = FALSE)
>   6: .Method(..., f = f)
>   7: eval(expr, envir, enclos)
>   8: eval(.dotsCall, env)
>   9: eval(.dotsCall, env)
> 10: standardGeneric("Map")
> 11: Map(function(file, mode) {    bf <- open(BcfFile(file, character(0),
> ...))    on.exit(close(bf))    scanBcfHeader(bf)}, file, ...)
> 12: scanBcfHeader(file, ...)
> 13: scanBcfHeader(file, ...)
> 14: scanVcfHeader("~/GATK-64/GATK_AUTOMATION/VCF/Adams.vcf")
> 15: scanVcfHeader("~/GATK-64/GATK_AUTOMATION/VCF/Adams.vcf")
>
>
> My sessioninfo
> R Under development (unstable) (2013-02-17 r61981)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.5.38 Rsamtools_1.11.16        Biostrings_2.27.11
>
> [4] GenomicRanges_1.11.29    IRanges_1.17.32          BiocGenerics_0.5.6
>
>
> loaded via a namespace (and not attached):
>   [1] AnnotationDbi_1.21.10   Biobase_2.19.2          biomaRt_2.15.0
>   [4] bitops_1.0-5            BSgenome_1.27.1         DBI_0.2-5
>   [7] GenomicFeatures_1.11.11 RCurl_1.95-3            RSQLite_0.11.2
> [10] rtracklayer_1.19.9      stats4_3.0.0            tools_3.0.0
> [13] XML_3.95-0.1            zlibbioc_1.5.0
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list