[Bioc-devel] bgzip BUG with too many files [WAS: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to close filehandle on tabix files produced]
Cook, Malcolm
MEC at stowers.org
Mon Nov 12 17:36:10 CET 2012
> I think this is now fixed in Rsamtools 1.10.2 / 1.11.9, available in svn now and
> biocLite, hopefully, Saturday 10am Seattle time.
Excellent. I have confirmed the fix works for me.
~Malcolm
>
> Martin
>
> On 11/09/2012 07:03 AM, Cook, Malcolm wrote:
> > Michael,
> >
> > For easier testing, with my mac OSX I can dial down the limit on number of files
> > using shell command `ulimit -n 30` but YMMV depending on OS support.
> >
> > In any case, your suspicions were on target. R function bgzip seems to be the
> > culprit, and I am changing subject and cc:ing in Martin and Herve accordingly.
> >
> > Martin xor Herve,
> >
> > The problem can be reproduced by just calling bgzip repeatedly.
> >
> > depending on your value for `ulimit -n`
> >
> > library(Rsamtools)
> >
> > bed<- system.file("doc", "example.bed", package="rtracklayer")
> >
> > replicate(2000,bgzip(bed, 'delme.now',TRUE))
> >
> > My workaround for now is to perform system calls to do the zipping and tabix
> > indexing. So, no urgency, but,
> >
> > sessionInfo() is as below.
> >
> > Thanks,
> >
> >
> > ~Malcolm
> >
> > *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> > <mailto:[mailto:lawrence.michael at gene.com]>
> > *Sent:* Friday, November 09, 2012 5:52 AM
> > *To:* Cook, Malcolm
> > *Cc:* bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>; Michael
> > Lawrence <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> > (lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>); Vincent Carey
> > (stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>)
> > *Subject:* Re: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to
> > close filehandle on tabix files produced
> >
> > Hi Malcolm,
> >
> > I am not sure why this is happening. I haven't been able to reproduce it on my
> > system (which I think has a limit of 1024, so I had to increase your test case
> > to exceed that). Does this happen when calling bgzip + indexTabix on a file 256
> > times? That would help to eliminate the complicated wrappers.
> >
> > Thanks,
> > Michael
> >
> >
> > On Thu, Nov 8, 2012 at 2:32 PM, Cook, Malcolm <MEC at stowers.org
> > <mailto:MEC at stowers.org>> wrote:
> >
> > rtracklayer developers (Michael/Vincent/Robert),
> >
> > I find that tabix indexed exporting too many bed files causes an error.
> >
> > The session following my signature reproduces the error.
> >
> > It provides sessionInfo() details prior to the code causing the error because
> > sessionInfo() FAILS with 'too many open files' after running this code (as does
> > anything the opens files).
> >
> > The error does NOT occur when index=FALSE. Only when index=TRUE.
> >
> > I expect that the tabix calls are not cleaning up open file handles correctly.
> >
> > uname -a tells me on my mac OSX that I can have 256 files open.
> >
> > The bug happens during the 253rd bedfile.
> >
> > openConnections() returns nothing.
> >
> > closeAllConnections() does not clean them up.
> >
> > lsof to list open files at the command line does NOT show them.
> >
> > Michael(?), you resolved a similar issue I once reported with rtracklayer when
> > creating bigBed files :
> > https://lists.soe.ucsc.edu/pipermail/genome/2012-February/028343.html
> >
> > Any suggestions for workarounds? Any possibility of a quick patch to released
> > rtracklayer?
> >
> > Thanks for rtracklayer!
> >
> > ~Malcolm Cook
> > -----------------------------------------------------------
> >
> > bash-3.2$ R
> >
> > R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> > Copyright (C) 2012 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >
> > R is free software and comes with ABSOLUTELY NO WARRANTY.
> > You are welcome to redistribute it under certain conditions.
> > Type 'license()' or 'licence()' for distribution details.
> >
> > Natural language support but running in an English locale
> >
> > R is a collaborative project with many contributors.
> > Type 'contributors()' for more information and
> > 'citation()' on how to cite R or R packages in publications.
> >
> > Type 'demo()' for some demos, 'help()' for on-line help, or
> > 'help.start()' for an HTML browser interface to help.
> > Type 'q()' to quit R.
> >
> > > library(rtracklayer)
> > Loading required package: GenomicRanges
> > Loading required package: BiocGenerics
> >
> > Attaching package: 'BiocGenerics'
> >
> > The following object(s) are masked from 'package:stats':
> >
> > xtabs
> >
> > The following object(s) are masked from 'package:base':
> >
> > Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames,
> > duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax,
> > pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>, rbind, rep.int
> > <http://rep.int>, rownames, sapply, setdiff, table, tapply, union, unique
> >
> > Loading required package: IRanges
> > Warning message:
> > package 'GenomicRanges' was built under R version 2.15.2
> >
> > > sessionInfo()
> > R version 2.15.1 (2012-06-22)
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >
> > locale:
> > [1] C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> >
> > other attached packages:
> > [1] rtracklayer_1.18.0 GenomicRanges_1.10.4 IRanges_1.16.4
> > BiocGenerics_0.4.0
> >
> > loaded via a namespace (and not attached):
> > [1] BSgenome_1.26.1 Biostrings_2.26.2 RCurl_1.95-3 Rsamtools_1.10.1
> > XML_3.95-0.1 bitops_1.0-4.2 parallel_2.15.1 stats4_2.15.1
> > tools_2.15.1 zlibbioc_1.4.0
> >
> >
> > > x<-sapply(sprintf('deleteme_%s.bed',1:1000), function(conn)
> > {export(GRanges('X',IRanges(1,2)),conn,index=TRUE);1})
> > Error in value[[3L]](cond) : index build failed
> > file: /Volumes/SAN1/Users/mec/deleteme/253.bed.gz
> > In addition: Warning message:
> > In doTryCatch(return(expr), name, parentenv, handler) :
> > [ti_index_build2] fail to create the index file.
> >
> >
> > > sessionInfo()
> > Error in gzfile(file, "rb") : cannot open the connection
> > In addition: Warning message:
> > In gzfile(file, "rb") :
> > cannot open compressed file
> > '/Library/Frameworks/R.framework/Versions/2.15/Resources/library/rtracklayer/Meta/package.rds',
> > probable reason 'Too many open files'
> >
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
More information about the Bioc-devel
mailing list