[Bioc-devel] bgzip BUG with too many files [WAS: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to close filehandle on tabix files produced]

Cook, Malcolm MEC at stowers.org
Mon Nov 12 17:36:10 CET 2012


> I think this is now fixed in Rsamtools 1.10.2 / 1.11.9, available in svn now and
> biocLite, hopefully, Saturday 10am Seattle time.

Excellent.  I have confirmed the fix works for me.  

~Malcolm

> 
> Martin
> 
> On 11/09/2012 07:03 AM, Cook, Malcolm wrote:
> > Michael,
> >
> > For easier testing, with my mac OSX I can dial down the limit on number of files
> > using shell command `ulimit -n 30` but YMMV depending on OS support.
> >
> > In any case, your suspicions were on target.  R function bgzip seems to be the
> > culprit, and I am changing subject and cc:ing in Martin and Herve accordingly.
> >
> > Martin xor Herve,
> >
> > The problem can be reproduced by just calling bgzip repeatedly.
> >
> > depending on your value for `ulimit -n`
> >
> > library(Rsamtools)
> >
> > bed<- system.file("doc", "example.bed", package="rtracklayer")
> >
> > replicate(2000,bgzip(bed, 'delme.now',TRUE))
> >
> > My workaround for now is to perform system calls to do the zipping and tabix
> > indexing.  So, no urgency, but,
> >
> > sessionInfo() is as below.
> >
> > Thanks,
> >
> >
> > ~Malcolm
> >
> > *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> > <mailto:[mailto:lawrence.michael at gene.com]>
> > *Sent:* Friday, November 09, 2012 5:52 AM
> > *To:* Cook, Malcolm
> > *Cc:* bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>; Michael
> > Lawrence <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> > (lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>); Vincent Carey
> > (stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>)
> > *Subject:* Re: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to
> > close filehandle on tabix files produced
> >
> > Hi Malcolm,
> >
> > I am not sure why this is happening. I haven't been able to reproduce it on my
> > system (which I think has a limit of 1024, so I had to increase your test case
> > to exceed that). Does this happen when calling bgzip + indexTabix on a file 256
> > times? That would help to eliminate the complicated wrappers.
> >
> > Thanks,
> > Michael
> >
> >
> > On Thu, Nov 8, 2012 at 2:32 PM, Cook, Malcolm <MEC at stowers.org
> > <mailto:MEC at stowers.org>> wrote:
> >
> > rtracklayer developers (Michael/Vincent/Robert),
> >
> > I find that tabix indexed exporting too many bed files causes an error.
> >
> > The session following my signature reproduces the error.
> >
> > It provides sessionInfo() details prior to the code causing the error because
> > sessionInfo() FAILS with 'too many open files' after running this code (as does
> > anything the opens files).
> >
> > The error does NOT occur when index=FALSE.  Only when index=TRUE.
> >
> > I expect that the tabix calls are not cleaning up open file handles correctly.
> >
> > uname -a tells me on my mac OSX that I can have 256 files open.
> >
> > The bug happens during the 253rd bedfile.
> >
> > openConnections() returns nothing.
> >
> > closeAllConnections() does not clean them up.
> >
> > lsof to list open files at the command line does NOT show them.
> >
> > Michael(?), you resolved a similar issue I once reported with rtracklayer when
> > creating bigBed files :
> > https://lists.soe.ucsc.edu/pipermail/genome/2012-February/028343.html
> >
> > Any suggestions for workarounds?  Any possibility of a quick patch to released
> > rtracklayer?
> >
> > Thanks for rtracklayer!
> >
> > ~Malcolm Cook
> > -----------------------------------------------------------
> >
> > bash-3.2$ R
> >
> > R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> > Copyright (C) 2012 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >
> > R is free software and comes with ABSOLUTELY NO WARRANTY.
> > You are welcome to redistribute it under certain conditions.
> > Type 'license()' or 'licence()' for distribution details.
> >
> >    Natural language support but running in an English locale
> >
> > R is a collaborative project with many contributors.
> > Type 'contributors()' for more information and
> > 'citation()' on how to cite R or R packages in publications.
> >
> > Type 'demo()' for some demos, 'help()' for on-line help, or
> > 'help.start()' for an HTML browser interface to help.
> > Type 'q()' to quit R.
> >
> >  > library(rtracklayer)
> > Loading required package: GenomicRanges
> > Loading required package: BiocGenerics
> >
> > Attaching package: 'BiocGenerics'
> >
> > The following object(s) are masked from 'package:stats':
> >
> >      xtabs
> >
> > The following object(s) are masked from 'package:base':
> >
> >      Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames,
> > duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax,
> > pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>, rbind, rep.int
> > <http://rep.int>, rownames, sapply, setdiff, table, tapply, union, unique
> >
> > Loading required package: IRanges
> > Warning message:
> > package 'GenomicRanges' was built under R version 2.15.2
> >
> >  > sessionInfo()
> > R version 2.15.1 (2012-06-22)
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >
> > locale:
> > [1] C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] rtracklayer_1.18.0   GenomicRanges_1.10.4 IRanges_1.16.4
> > BiocGenerics_0.4.0
> >
> > loaded via a namespace (and not attached):
> >   [1] BSgenome_1.26.1   Biostrings_2.26.2 RCurl_1.95-3      Rsamtools_1.10.1
> >   XML_3.95-0.1      bitops_1.0-4.2    parallel_2.15.1   stats4_2.15.1
> > tools_2.15.1      zlibbioc_1.4.0
> >
> >
> >  > x<-sapply(sprintf('deleteme_%s.bed',1:1000), function(conn)
> > {export(GRanges('X',IRanges(1,2)),conn,index=TRUE);1})
> > Error in value[[3L]](cond) : index build failed
> >    file: /Volumes/SAN1/Users/mec/deleteme/253.bed.gz
> > In addition: Warning message:
> > In doTryCatch(return(expr), name, parentenv, handler) :
> >    [ti_index_build2] fail to create the index file.
> >
> >
> >  > sessionInfo()
> > Error in gzfile(file, "rb") : cannot open the connection
> > In addition: Warning message:
> > In gzfile(file, "rb") :
> >    cannot open compressed file
> > '/Library/Frameworks/R.framework/Versions/2.15/Resources/library/rtracklayer/Meta/package.rds',
> > probable reason 'Too many open files'
> >
> 
> 
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the Bioc-devel mailing list