[Bioc-devel] bgzip BUG with too many files [WAS: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to close filehandle on tabix files produced]

Martin Morgan mtmorgan at fhcrc.org
Fri Nov 9 23:24:24 CET 2012


I think this is now fixed in Rsamtools 1.10.2 / 1.11.9, available in svn now and 
biocLite, hopefully, Saturday 10am Seattle time.

Martin

On 11/09/2012 07:03 AM, Cook, Malcolm wrote:
> Michael,
>
> For easier testing, with my mac OSX I can dial down the limit on number of files
> using shell command `ulimit –n 30` but YMMV depending on OS support.
>
> In any case, your suspicions were on target.  R function bgzip seems to be the
> culprit, and I am changing subject and cc:ing in Martin and Herve accordingly.
>
> Martin xor Herve,
>
> The problem can be reproduced by just calling bgzip repeatedly.
>
> depending on your value for `ulimit –n`
>
> library(Rsamtools)
>
> bed<- system.file("doc", "example.bed", package="rtracklayer")
>
> replicate(2000,bgzip(bed, 'delme.now',TRUE))
>
> My workaround for now is to perform system calls to do the zipping and tabix
> indexing.  So, no urgency, but,
>
> sessionInfo() is as below.
>
> Thanks,
>
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> <mailto:[mailto:lawrence.michael at gene.com]>
> *Sent:* Friday, November 09, 2012 5:52 AM
> *To:* Cook, Malcolm
> *Cc:* bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>; Michael
> Lawrence <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> (lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>); Vincent Carey
> (stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>)
> *Subject:* Re: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to
> close filehandle on tabix files produced
>
> Hi Malcolm,
>
> I am not sure why this is happening. I haven't been able to reproduce it on my
> system (which I think has a limit of 1024, so I had to increase your test case
> to exceed that). Does this happen when calling bgzip + indexTabix on a file 256
> times? That would help to eliminate the complicated wrappers.
>
> Thanks,
> Michael
>
>
> On Thu, Nov 8, 2012 at 2:32 PM, Cook, Malcolm <MEC at stowers.org
> <mailto:MEC at stowers.org>> wrote:
>
> rtracklayer developers (Michael/Vincent/Robert),
>
> I find that tabix indexed exporting too many bed files causes an error.
>
> The session following my signature reproduces the error.
>
> It provides sessionInfo() details prior to the code causing the error because
> sessionInfo() FAILS with 'too many open files' after running this code (as does
> anything the opens files).
>
> The error does NOT occur when index=FALSE.  Only when index=TRUE.
>
> I expect that the tabix calls are not cleaning up open file handles correctly.
>
> uname -a tells me on my mac OSX that I can have 256 files open.
>
> The bug happens during the 253rd bedfile.
>
> openConnections() returns nothing.
>
> closeAllConnections() does not clean them up.
>
> lsof to list open files at the command line does NOT show them.
>
> Michael(?), you resolved a similar issue I once reported with rtracklayer when
> creating bigBed files :
> https://lists.soe.ucsc.edu/pipermail/genome/2012-February/028343.html
>
> Any suggestions for workarounds?  Any possibility of a quick patch to released
> rtracklayer?
>
> Thanks for rtracklayer!
>
> ~Malcolm Cook
> -----------------------------------------------------------
>
> bash-3.2$ R
>
> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>    Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>  > library(rtracklayer)
> Loading required package: GenomicRanges
> Loading required package: BiocGenerics
>
> Attaching package: 'BiocGenerics'
>
> The following object(s) are masked from 'package:stats':
>
>      xtabs
>
> The following object(s) are masked from 'package:base':
>
>      Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames,
> duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax,
> pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>, rbind, rep.int
> <http://rep.int>, rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: IRanges
> Warning message:
> package 'GenomicRanges' was built under R version 2.15.2
>
>  > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rtracklayer_1.18.0   GenomicRanges_1.10.4 IRanges_1.16.4
> BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
>   [1] BSgenome_1.26.1   Biostrings_2.26.2 RCurl_1.95-3      Rsamtools_1.10.1
>   XML_3.95-0.1      bitops_1.0-4.2    parallel_2.15.1   stats4_2.15.1
> tools_2.15.1      zlibbioc_1.4.0
>
>
>  > x<-sapply(sprintf('deleteme_%s.bed',1:1000), function(conn)
> {export(GRanges('X',IRanges(1,2)),conn,index=TRUE);1})
> Error in value[[3L]](cond) : index build failed
>    file: /Volumes/SAN1/Users/mec/deleteme/253.bed.gz
> In addition: Warning message:
> In doTryCatch(return(expr), name, parentenv, handler) :
>    [ti_index_build2] fail to create the index file.
>
>
>  > sessionInfo()
> Error in gzfile(file, "rb") : cannot open the connection
> In addition: Warning message:
> In gzfile(file, "rb") :
>    cannot open compressed file
> '/Library/Frameworks/R.framework/Versions/2.15/Resources/library/rtracklayer/Meta/package.rds',
> probable reason 'Too many open files'
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list