[Bioc-devel] 'memory not mapped' in trimLRpatterns
Michael Stadler
michael.stadler at fmi.ch
Fri May 8 11:10:02 CEST 2015
Hi Herve,
Thanks a lot for the fix - it runs smoothly now on both release and devel.
Cheers,
Michael
On 06.05.2015 10:57, Hervé Pagès wrote:
> Hi Michael,
>
> I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2
> (devel). Both should become available via biocLite() on Friday around
> 11am (Seattle time). Thanks for your patience.
>
> Cheers,
> H.
>
>
> On 04/30/2015 10:50 AM, Hervé Pagès wrote:
>> Hi Michael,
>>
>> Thanks for the reminder. I must confess this slipped out of my radar.
>> I'll look into it today and will let you know.
>>
>> H.
>>
>> On 04/30/2015 08:42 AM, Michael Stadler wrote:
>>> Hi Herve,
>>>
>>> I stumbled again over the 'memory not mapped' issue in trimLRpatterns
>>> using updated versions of R and BioC-devel. I guess it does not hit
>>> people very often, but I would highly appreciate if it could be fixed.
>>>
>>> Many thanks,
>>> Michael
>>>
>>> PS: I can reproduce the issue using the code below under:
>>> R version 3.2.0 (2015-04-16)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats4 parallel stats graphics grDevices utils datasets
>>> [8] methods base
>>>
>>> other attached packages:
>>> [1] ShortRead_1.27.1 GenomicAlignments_1.5.1 Rsamtools_1.21.3
>>> [4] GenomicRanges_1.21.5 GenomeInfoDb_1.5.1 BiocParallel_1.3.4
>>> [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6
>>> [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0
>>> [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2
>>> [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7
>>> [10] Biobase_2.29.0
>>>
>>>
>>> On 25.04.2014 13:11, Hervé Pagès wrote:
>>>> Hi Michael,
>>>>
>>>> Thanks for the report. I'll look into this.
>>>>
>>>> H.
>>>>
>>>> On 04/22/2014 08:29 AM, Michael Stadler wrote:
>>>>> Dear Herve,
>>>>>
>>>>> We are hitting a 'memory not mapped' problem when using trimLRpatterns
>>>>> as detailed below. I did not manage to reproduce it with few
>>>>> sequences,
>>>>> so I have to refer to a publicly available sequence file with many
>>>>> reads. Even then, it occasionally runs through without problems.
>>>>>
>>>>> Also, our use-case may not be typical and be part of the problem -
>>>>> maybe
>>>>> the solution will be to change our use of trimLRpatterns.
>>>>>
>>>>> Here is some code to illustrate/reproduce the problem:
>>>>>
>>>>> library(Biostrings)
>>>>> library(ShortRead)
>>>>>
>>>>> Rpat <- "TGGAATTCTCGGGTGCCAAGG"
>>>>> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))
>>>>>
>>>>> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
>>>>> "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))
>>>>>
>>>>> # the second read is not trimmed because it runs through the adaptor
>>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>>>> # A DNAStringSet instance of length 2
>>>>> # width seq
>>>>> #[1] 4 AAAA
>>>>> #[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT
>>>>>
>>>>> # as a workaround, we pad the adaptor with Ns and
>>>>> # increase the mismatch tolerance
>>>>> numNs <- 90
>>>>> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
>>>>> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")
>>>>>
>>>>> # now, also the second read gets trimmed
>>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
>>>>> # A DNAStringSet instance of length 2
>>>>> # width seq
>>>>> #[1] 4 AAAA
>>>>> #[2] 4 AAAA
>>>>>
>>>>> # to trigger the segmentation fault, many reads are needed
>>>>> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz",
>>>>>
>>>>>
>>>>>
>>>>> "ERR000916_1.fastq.gz")
>>>>> fq2 <- readFastq("ERR000916_1.fastq.gz")
>>>>>
>>>>> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat,
>>>>> max.Rmismatch=maxRmm)
>>>>> # *** caught segfault ***
>>>>> #address 0x7f5109be4fed, cause 'memory not mapped'
>>>>> #
>>>>> #Traceback:
>>>>> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>>>> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
>>>>> at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type,
>>>>> auto.reduce.pattern, PACKAGE = "Biostrings")
>>>>> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
>>>>> min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
>>>>> auto.reduce.pattern)
>>>>> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>>> ending.at = subject_width, max.mismatch = max.Rmismatch,
>>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern =
>>>>> TRUE)
>>>>> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
>>>>> ending.at = subject_width, max.mismatch = max.Rmismatch,
>>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern =
>>>>> TRUE)
>>>>> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
>>>>> Rfixed)
>>>>> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
>>>>> max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
>>>>> Rfixed, ranges)
>>>>> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
>>>>> ranges
>>>>> = TRUE)
>>>>> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
>>>>> ranges
>>>>> = TRUE)
>>>>> #10: eval(expr, envir, enclos)
>>>>> #11: eval(call, sys.frame(sys.parent()))
>>>>> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
>>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,
>>>>> ranges =
>>>>> TRUE)
>>>>> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>>>> maxRmm, with.Rindels = FALSE)
>>>>> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
>>>>> maxRmm, with.Rindels = FALSE)
>>>>>
>>>>> The problem did not occur in R 3.0.3 with BioC 2.13.
>>>>> Do you have an idea what's wrong?
>>>>>
>>>>> Thanks for your help,
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> sessionInfo()R version 3.1.0 (2014-04-10)
>>>>> #Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>> #
>>>>> #locale:
>>>>> #[1] C
>>>>> #
>>>>> #attached base packages:
>>>>> #[1] parallel stats graphics grDevices utils datasets
>>>>> methods
>>>>> #[8] base
>>>>> #
>>>>> #other attached packages:
>>>>> # [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0
>>>>>
>>>>> # [4] Rsamtools_1.16.0 GenomicRanges_1.16.0
>>>>> GenomeInfoDb_1.0.1
>>>>>
>>>>> # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0
>>>>>
>>>>> #[10] IRanges_1.22.2 BiocGenerics_0.10.0
>>>>> RColorBrewer_1.0-5
>>>>>
>>>>> #
>>>>> #loaded via a namespace (and not attached):
>>>>> # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0
>>>>> # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1
>>>>> # [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8
>>>>> #[10] compiler_3.1.0 digest_0.6.4 fail_1.2
>>>>> #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3
>>>>> #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26
>>>>> #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0
>>>>> #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>
>
More information about the Bioc-devel
mailing list