[Bioc-devel] 'memory not mapped' in trimLRpatterns

Hervé Pagès hpages at fredhutch.org
Thu Apr 30 19:50:54 CEST 2015


Hi Michael,

Thanks for the reminder. I must confess this slipped out of my radar.
I'll look into it today and will let you know.

H.

On 04/30/2015 08:42 AM, Michael Stadler wrote:
> Hi Herve,
>
> I stumbled again over the 'memory not mapped' issue in trimLRpatterns
> using updated versions of R and BioC-devel. I guess it does not hit
> people very often, but I would highly appreciate if it could be fixed.
>
> Many thanks,
> Michael
>
> PS: I can reproduce the issue using the code below under:
> R version 3.2.0 (2015-04-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
>   [1] ShortRead_1.27.1        GenomicAlignments_1.5.1 Rsamtools_1.21.3
>   [4] GenomicRanges_1.21.5    GenomeInfoDb_1.5.1      BiocParallel_1.3.4
>   [7] Biostrings_2.37.0       XVector_0.9.0           IRanges_2.3.6
> [10] S4Vectors_0.7.0         BiocGenerics_0.15.0     RColorBrewer_1.1-2
>
> loaded via a namespace (and not attached):
>   [1] lattice_0.20-31      bitops_1.0-6         grid_3.2.0
>   [4] futile.options_1.0.0 zlibbioc_1.15.0      hwriter_1.3.2
>   [7] latticeExtra_0.6-26  futile.logger_1.4.1  lambda.r_1.1.7
> [10] Biobase_2.29.0
>
>
> On 25.04.2014 13:11, Hervé Pagès wrote:
>> Hi Michael,
>>
>> Thanks for the report. I'll look into this.
>>
>> H.
>>
>> On 04/22/2014 08:29 AM, Michael Stadler wrote:
>>> Dear Herve,
>>>
>>> We are hitting a 'memory not mapped' problem when using trimLRpatterns
>>> as detailed below. I did not manage to reproduce it with few sequences,
>>> so I have to refer to a publicly available sequence file with many
>>> reads. Even then, it occasionally runs through without problems.
>>>
>>> Also, our use-case may not be typical and be part of the problem - maybe
>>> the solution will be to change our use of trimLRpatterns.
>>>
>>> Here is some code to illustrate/reproduce the problem:
>>>
>>> library(Biostrings)
>>> library(ShortRead)
>>>
>>> Rpat <- "TGGAATTCTCGGGTGCCAAGG"
>>> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))
>>>
>>> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
>>>                         "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))
>>>
>>> # the second read is not trimmed because it runs through the adaptor
>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>> #  A DNAStringSet instance of length 2
>>> #    width seq
>>> #[1]     4 AAAA
>>> #[2]    28 AAAATGGAATTCTCGGGTGCCAAGGTTT
>>>
>>> # as a workaround, we pad the adaptor with Ns and
>>> # increase the mismatch tolerance
>>> numNs <- 90
>>> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
>>> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")
>>>
>>> # now, also the second read gets trimmed
>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>> #  A DNAStringSet instance of length 2
>>> #    width seq
>>> #[1]     4 AAAA
>>> #[2]     4 AAAA
>>>
>>> # to trigger the segmentation fault, many reads are needed
>>> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz",
>>>
>>> "ERR000916_1.fastq.gz")
>>> fq2 <- readFastq("ERR000916_1.fastq.gz")
>>>
>>> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>> # *** caught segfault ***
>>> #address 0x7f5109be4fed, cause 'memory not mapped'
>>> #
>>> #Traceback:
>>> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
>>> at.type, max.mismatch, min.mismatch, with.indels, fixed,     ans.type,
>>> auto.reduce.pattern, PACKAGE = "Biostrings")
>>> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
>>> min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
>>> auto.reduce.pattern)
>>> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>    ending.at = subject_width, max.mismatch = max.Rmismatch,
>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
>>> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>    ending.at = subject_width, max.mismatch = max.Rmismatch,
>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
>>> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
>>>    Rfixed)
>>> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
>>> max.Lmismatch,     max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
>>> Rfixed,     ranges)
>>> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>      max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges
>>> = TRUE)
>>> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>      max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges
>>> = TRUE)
>>> #10: eval(expr, envir, enclos)
>>> #11: eval(call, sys.frame(sys.parent()))
>>> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges =
>>> TRUE)
>>> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>> maxRmm,     with.Rindels = FALSE)
>>> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>> maxRmm,     with.Rindels = FALSE)
>>>
>>> The problem did not occur in R 3.0.3 with BioC 2.13.
>>> Do you have an idea what's wrong?
>>>
>>> Thanks for your help,
>>> Michael
>>>
>>>
>>>
>>> sessionInfo()R version 3.1.0 (2014-04-10)
>>> #Platform: x86_64-unknown-linux-gnu (64-bit)
>>> #
>>> #locale:
>>> #[1] C
>>> #
>>> #attached base packages:
>>> #[1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> #[8] base
>>> #
>>> #other attached packages:
>>> # [1] ShortRead_1.22.0        GenomicAlignments_1.0.0 BSgenome_1.32.0
>>>
>>> # [4] Rsamtools_1.16.0        GenomicRanges_1.16.0    GenomeInfoDb_1.0.1
>>>
>>> # [7] BiocParallel_0.6.0      Biostrings_2.32.0       XVector_0.4.0
>>>
>>> #[10] IRanges_1.22.2          BiocGenerics_0.10.0     RColorBrewer_1.0-5
>>>
>>> #
>>> #loaded via a namespace (and not attached):
>>> # [1] BBmisc_1.5          BatchJobs_1.2       Biobase_2.24.0
>>> # [4] DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.1
>>> # [7] bitops_1.0-6        brew_1.0-6          codetools_0.2-8
>>> #[10] compiler_3.1.0      digest_0.6.4        fail_1.2
>>> #[13] foreach_1.4.2       grid_3.1.0          hwriter_1.3
>>> #[16] iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26
>>> #[19] plyr_1.8.1          sendmailR_1.1-2     stats4_3.1.0
>>> #[22] stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list