[Bioc-devel] 'memory not mapped' in trimLRpatterns

Hervé Pagès hpages at fredhutch.org
Wed May 6 10:57:19 CEST 2015


Hi Michael,

I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2
(devel). Both should become available via biocLite() on Friday around
11am (Seattle time). Thanks for your patience.

Cheers,
H.


On 04/30/2015 10:50 AM, Hervé Pagès wrote:
> Hi Michael,
>
> Thanks for the reminder. I must confess this slipped out of my radar.
> I'll look into it today and will let you know.
>
> H.
>
> On 04/30/2015 08:42 AM, Michael Stadler wrote:
>> Hi Herve,
>>
>> I stumbled again over the 'memory not mapped' issue in trimLRpatterns
>> using updated versions of R and BioC-devel. I guess it does not hit
>> people very often, but I would highly appreciate if it could be fixed.
>>
>> Many thanks,
>> Michael
>>
>> PS: I can reproduce the issue using the code below under:
>> R version 3.2.0 (2015-04-16)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>> [8] methods   base
>>
>> other attached packages:
>>   [1] ShortRead_1.27.1        GenomicAlignments_1.5.1 Rsamtools_1.21.3
>>   [4] GenomicRanges_1.21.5    GenomeInfoDb_1.5.1      BiocParallel_1.3.4
>>   [7] Biostrings_2.37.0       XVector_0.9.0           IRanges_2.3.6
>> [10] S4Vectors_0.7.0         BiocGenerics_0.15.0     RColorBrewer_1.1-2
>>
>> loaded via a namespace (and not attached):
>>   [1] lattice_0.20-31      bitops_1.0-6         grid_3.2.0
>>   [4] futile.options_1.0.0 zlibbioc_1.15.0      hwriter_1.3.2
>>   [7] latticeExtra_0.6-26  futile.logger_1.4.1  lambda.r_1.1.7
>> [10] Biobase_2.29.0
>>
>>
>> On 25.04.2014 13:11, Hervé Pagès wrote:
>>> Hi Michael,
>>>
>>> Thanks for the report. I'll look into this.
>>>
>>> H.
>>>
>>> On 04/22/2014 08:29 AM, Michael Stadler wrote:
>>>> Dear Herve,
>>>>
>>>> We are hitting a 'memory not mapped' problem when using trimLRpatterns
>>>> as detailed below. I did not manage to reproduce it with few sequences,
>>>> so I have to refer to a publicly available sequence file with many
>>>> reads. Even then, it occasionally runs through without problems.
>>>>
>>>> Also, our use-case may not be typical and be part of the problem -
>>>> maybe
>>>> the solution will be to change our use of trimLRpatterns.
>>>>
>>>> Here is some code to illustrate/reproduce the problem:
>>>>
>>>> library(Biostrings)
>>>> library(ShortRead)
>>>>
>>>> Rpat <- "TGGAATTCTCGGGTGCCAAGG"
>>>> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))
>>>>
>>>> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
>>>>                         "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))
>>>>
>>>> # the second read is not trimmed because it runs through the adaptor
>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>>> #  A DNAStringSet instance of length 2
>>>> #    width seq
>>>> #[1]     4 AAAA
>>>> #[2]    28 AAAATGGAATTCTCGGGTGCCAAGGTTT
>>>>
>>>> # as a workaround, we pad the adaptor with Ns and
>>>> # increase the mismatch tolerance
>>>> numNs <- 90
>>>> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
>>>> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")
>>>>
>>>> # now, also the second read gets trimmed
>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>>> #  A DNAStringSet instance of length 2
>>>> #    width seq
>>>> #[1]     4 AAAA
>>>> #[2]     4 AAAA
>>>>
>>>> # to trigger the segmentation fault, many reads are needed
>>>> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz",
>>>>
>>>>
>>>> "ERR000916_1.fastq.gz")
>>>> fq2 <- readFastq("ERR000916_1.fastq.gz")
>>>>
>>>> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>>> # *** caught segfault ***
>>>> #address 0x7f5109be4fed, cause 'memory not mapped'
>>>> #
>>>> #Traceback:
>>>> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>>> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
>>>> at.type, max.mismatch, min.mismatch, with.indels, fixed,     ans.type,
>>>> auto.reduce.pattern, PACKAGE = "Biostrings")
>>>> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
>>>> min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
>>>> auto.reduce.pattern)
>>>> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>>    ending.at = subject_width, max.mismatch = max.Rmismatch,
>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
>>>> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>>    ending.at = subject_width, max.mismatch = max.Rmismatch,
>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
>>>> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
>>>>    Rfixed)
>>>> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
>>>> max.Lmismatch,     max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
>>>> Rfixed,     ranges)
>>>> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>>      max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
>>>> ranges
>>>> = TRUE)
>>>> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>>      max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
>>>> ranges
>>>> = TRUE)
>>>> #10: eval(expr, envir, enclos)
>>>> #11: eval(call, sys.frame(sys.parent()))
>>>> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges =
>>>> TRUE)
>>>> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>>> maxRmm,     with.Rindels = FALSE)
>>>> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>>> maxRmm,     with.Rindels = FALSE)
>>>>
>>>> The problem did not occur in R 3.0.3 with BioC 2.13.
>>>> Do you have an idea what's wrong?
>>>>
>>>> Thanks for your help,
>>>> Michael
>>>>
>>>>
>>>>
>>>> sessionInfo()R version 3.1.0 (2014-04-10)
>>>> #Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> #
>>>> #locale:
>>>> #[1] C
>>>> #
>>>> #attached base packages:
>>>> #[1] parallel  stats     graphics  grDevices utils     datasets
>>>> methods
>>>> #[8] base
>>>> #
>>>> #other attached packages:
>>>> # [1] ShortRead_1.22.0        GenomicAlignments_1.0.0 BSgenome_1.32.0
>>>>
>>>> # [4] Rsamtools_1.16.0        GenomicRanges_1.16.0
>>>> GenomeInfoDb_1.0.1
>>>>
>>>> # [7] BiocParallel_0.6.0      Biostrings_2.32.0       XVector_0.4.0
>>>>
>>>> #[10] IRanges_1.22.2          BiocGenerics_0.10.0
>>>> RColorBrewer_1.0-5
>>>>
>>>> #
>>>> #loaded via a namespace (and not attached):
>>>> # [1] BBmisc_1.5          BatchJobs_1.2       Biobase_2.24.0
>>>> # [4] DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.1
>>>> # [7] bitops_1.0-6        brew_1.0-6          codetools_0.2-8
>>>> #[10] compiler_3.1.0      digest_0.6.4        fail_1.2
>>>> #[13] foreach_1.4.2       grid_3.1.0          hwriter_1.3
>>>> #[16] iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26
>>>> #[19] plyr_1.8.1          sendmailR_1.1-2     stats4_3.1.0
>>>> #[22] stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list