[Bioc-devel] [VariantAnnotation] segfault thrown when scan header for VCF files out of GATK pipeline

Martin Morgan mtmorgan at fhcrc.org
Tue Feb 26 21:47:13 CET 2013


On 2/26/2013 8:17 AM, Tengfei Yin wrote:
> Hi Martin,
>
> Sorry for the late update, actually, it's not a bug in VariantAnnotation, I
> checked the files, it's the header problem, got it from some collaborators, they
> are using their own scripts to combine vcf files,  and use vcftools instead
> solved the problem.

Tengfei helped to identify the problem as a missing 'header' line #CHROM 
POS...". Rsamtools 1.11.18 in the devel branch now provides an informative error 
message (it seems worth an error rather than silently parsing, since without 
this line the samples are not identified by name).

Martin

>
> Thanks again
>
> Tengfei
>
> On Tue, Feb 19, 2013 at 2:17 PM, Tengfei Yin <yintengfei at gmail.com
> <mailto:yintengfei at gmail.com>> wrote:
>
>     Hi Martin,
>
>     Thanks a lot for the quick help!
>
>     I will get back to you off-list with the ouput and vcf header after trying
>     what you suggested.
>
>     Tengfei
>
>     On Tue, Feb 19, 2013 at 1:58 PM, Martin Morgan <mtmorgan at fhcrc.org
>     <mailto:mtmorgan at fhcrc.org>> wrote:
>
>         On 02/19/2013 11:44 AM, Tengfei Yin wrote:
>
>             Hi ,
>
>             I am working on the vcf files from GATK pipeline(usually around
>             400Mb/file),  but I encountered some problems importing vcf files in R
>             using VariantAnnotation package, this has been confirmed in both
>             released
>             and devel version of VariantAnnotation, I only provide sessionInfo for
>             devel-branch. I may not have the permission to provide the data here, I
>             just post here first to see if there is an obvious answer I don't
>             know yet,
>             if any data and reproducible example are needed, I could work on that.
>
>             I don't know it's an issue in R, samtools or GATK... and I have no
>             problem
>             importing vcf files from just bcftools pipeline.
>
>             If you need any details like command pipeline and version of other
>             software, please let me know. Thanks a lot.
>
>
>         Probably a problem in Rsamtools C code, depending just on the header of
>         the VCF file, maybe triggered by garbage collection. You could debug
>         further yourself by setting
>
>         ~/.R/Makevars:
>         CFLAGS="-g -O0"
>
>         and then installing Rsamtools from source
>
>            biocLite("Rsamtools", type="source")
>
>         and finally running a minimal test script with either
>
>            R -d valgrind -f test.R
>
>         or under the gdb
>
>            R -d gdb -f test.R
>            (gdb) run
>            segfault occurs, then
>            (gdb) bt
>
>         to get a back trace. Feel free to share the output of either with me
>         off-list, or if possible to share just the header data from the vcf.
>
>         Martin
>
>
>
>
>
>             Tengfei
>
>                 scanVcfHeader("~/GATK-64/GATK___AUTOMATION/VCF/Adams.vcf")
>
>             Adams.vcf
>
>                 hdr = scanVcfHeader("~//GATK-64/__GATK_AUTOMATION/VCF/Adams.vcf"__)
>
>
>                *** caught segfault ***
>             address (nil), cause 'memory not mapped'
>
>             Traceback:
>                1: .Call(.scan_bcf_header, .extptr(file))
>                2: scanBcfHeader(bf)
>                3: scanBcfHeader(bf)
>                4: (function (file, mode) {    bf <- open(BcfFile(file, character(0),
>             ...))    on.exit(close(bf))    scanBcfHeader(bf)})(dots[[1L]]__[[1L]])
>                5: mapply(FUN = f, ..., SIMPLIFY = FALSE)
>                6: .Method(..., f = f)
>                7: eval(expr, envir, enclos)
>                8: eval(.dotsCall, env)
>                9: eval(.dotsCall, env)
>             10: standardGeneric("Map")
>             11: Map(function(file, mode) {    bf <- open(BcfFile(file, character(0),
>             ...))    on.exit(close(bf))    scanBcfHeader(bf)}, file, ...)
>             12: scanBcfHeader(file, ...)
>             13: scanBcfHeader(file, ...)
>             14: scanVcfHeader("~/GATK-64/GATK___AUTOMATION/VCF/Adams.vcf")
>             15: scanVcfHeader("~/GATK-64/GATK___AUTOMATION/VCF/Adams.vcf")
>
>
>             My sessioninfo
>             R Under development (unstable) (2013-02-17 r61981)
>             Platform: x86_64-unknown-linux-gnu (64-bit)
>
>             locale:
>                [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                [7] LC_PAPER=C                 LC_NAME=C
>                [9] LC_ADDRESS=C               LC_TELEPHONE=C
>             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>             attached base packages:
>             [1] parallel  stats     graphics  grDevices utils     datasets  methods
>             [8] base
>
>             other attached packages:
>             [1] VariantAnnotation_1.5.38 Rsamtools_1.11.16        Biostrings_2.27.11
>
>             [4] GenomicRanges_1.11.29    IRanges_1.17.32          BiocGenerics_0.5.6
>
>
>             loaded via a namespace (and not attached):
>                [1] AnnotationDbi_1.21.10   Biobase_2.19.2          biomaRt_2.15.0
>                [4] bitops_1.0-5            BSgenome_1.27.1         DBI_0.2-5
>                [7] GenomicFeatures_1.11.11 RCurl_1.95-3            RSQLite_0.11.2
>             [10] rtracklayer_1.19.9      stats4_3.0.0            tools_3.0.0
>             [13] XML_3.95-0.1            zlibbioc_1.5.0
>
>
>
>
>         --
>         Computational Biology / Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N.
>         PO Box 19024 Seattle, WA 98109
>
>         Location: Arnold Building M1 B861
>         Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>
>
>     --
>     Tengfei Yin
>     MCDB PhD student
>     1620 Howe Hall, 2274,
>     Iowa State University
>     Ames, IA,50011-2274
>
>
>
>
>
> --
> Tengfei Yin
> MCDB PhD student
> 1620 Howe Hall, 2274,
> Iowa State University
> Ames, IA,50011-2274
>
>


-- 
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109



More information about the Bioc-devel mailing list