[Bioc-devel] 'memory not mapped' in trimLRpatterns
Hervé Pagès
hpages at fhcrc.org
Fri Apr 25 13:11:45 CEST 2014
Hi Michael,
Thanks for the report. I'll look into this.
H.
On 04/22/2014 08:29 AM, Michael Stadler wrote:
> Dear Herve,
>
> We are hitting a 'memory not mapped' problem when using trimLRpatterns
> as detailed below. I did not manage to reproduce it with few sequences,
> so I have to refer to a publicly available sequence file with many
> reads. Even then, it occasionally runs through without problems.
>
> Also, our use-case may not be typical and be part of the problem - maybe
> the solution will be to change our use of trimLRpatterns.
>
> Here is some code to illustrate/reproduce the problem:
>
> library(Biostrings)
> library(ShortRead)
>
> Rpat <- "TGGAATTCTCGGGTGCCAAGG"
> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))
>
> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
> "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))
>
> # the second read is not trimmed because it runs through the adaptor
> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
> # A DNAStringSet instance of length 2
> # width seq
> #[1] 4 AAAA
> #[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT
>
> # as a workaround, we pad the adaptor with Ns and
> # increase the mismatch tolerance
> numNs <- 90
> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")
>
> # now, also the second read gets trimmed
> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
> # A DNAStringSet instance of length 2
> # width seq
> #[1] 4 AAAA
> #[2] 4 AAAA
>
> # to trigger the segmentation fault, many reads are needed
> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz",
> "ERR000916_1.fastq.gz")
> fq2 <- readFastq("ERR000916_1.fastq.gz")
>
> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
> # *** caught segfault ***
> #address 0x7f5109be4fed, cause 'memory not mapped'
> #
> #Traceback:
> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
> at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type,
> auto.reduce.pattern, PACKAGE = "Biostrings")
> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
> min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
> auto.reduce.pattern)
> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
> ending.at = subject_width, max.mismatch = max.Rmismatch,
> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
> ending.at = subject_width, max.mismatch = max.Rmismatch,
> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
> Rfixed)
> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
> max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
> Rfixed, ranges)
> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges
> = TRUE)
> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges
> = TRUE)
> #10: eval(expr, envir, enclos)
> #11: eval(call, sys.frame(sys.parent()))
> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges =
> TRUE)
> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
> maxRmm, with.Rindels = FALSE)
> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
> maxRmm, with.Rindels = FALSE)
>
> The problem did not occur in R 3.0.3 with BioC 2.13.
> Do you have an idea what's wrong?
>
> Thanks for your help,
> Michael
>
>
>
> sessionInfo()R version 3.1.0 (2014-04-10)
> #Platform: x86_64-unknown-linux-gnu (64-bit)
> #
> #locale:
> #[1] C
> #
> #attached base packages:
> #[1] parallel stats graphics grDevices utils datasets methods
> #[8] base
> #
> #other attached packages:
> # [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0
>
> # [4] Rsamtools_1.16.0 GenomicRanges_1.16.0 GenomeInfoDb_1.0.1
>
> # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0
>
> #[10] IRanges_1.22.2 BiocGenerics_0.10.0 RColorBrewer_1.0-5
>
> #
> #loaded via a namespace (and not attached):
> # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0
> # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1
> # [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8
> #[10] compiler_3.1.0 digest_0.6.4 fail_1.2
> #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3
> #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26
> #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0
> #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list