[Bioc-devel] BamFile validation

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 8 19:53:42 CET 2013

On 01/07/2013 12:32 PM, Nicolas Delhomme wrote:
> Hi Martin, Marc,
> I'm now implementing the use of BamFile objects in easyRNASeq and I like
> them. I think it would be very useful if when constructing a BamFile the
> existence of the path and index could be tested; i.e. this works:
> BamFile("test.bam","test.bam.bai") although these files do not exist. Is
> there a reason that this validation is not done? If there is, could a
> validation parameter be added (set to FALSE by default to keep the current
> behavior) that would check for the files' existence? The same goes for the
> yieldSize argument, i.e. this works
> BamFile("test.bam","test.bam.bai",yieldSize=-1), although I'm not sure what a
> -1 yieldSize means. I can of course do these validations within easyRNASeq,
> but anyone else building packages on top of BamFile would probably want to do
> the same...

I want to be able to specify a BAM file without opening it, and then open it 
later, e.g., in mclapply or after distributing to a cluster. Also, conceptually, 
I want to distinguish between processing an entire BAM file -- provide me with 
something for which isOpen(BamFile("foo")) == FALSE -- versus reading a chunk of 
a BamFile, i.e., already open. So I separated BamFile creation from open().

I focus on open() in the above because opening the BAM file is a cheap way to 
validate that the BAM file exists -- it could be local or remote (http or ftp, 
so file.exists isn't sufficient) and even if the file 'exists' as Ryan mentions 
it needs to actually be a BAM file so should, e.g., have a header. open() allows 
for all of these possibilities. Also, the consequences of trying to open a 
non-existent file results in a clear enough error

 > open(BamFile("sdfs"))
Error in value[[3L]](cond) :
   failed to open BamFile: file(s) do not exist:

So against the votes of the other contributors to this thread, I haven't made a 
change. Sorry about that.

I added a check that yieldSize is a non-negative scalar integer, or NA.

> A related point unclear at the moment in the documentation is what the index
> filename should be: i.e. scanBam expects as the index the same value as for
> the bam filename (that assumes the user has not renamed his bam.bai file  and
> you never know what users might be doing... :-S ... ) but the BamFile Rd page
> says:
> file: A character vector of BAM file paths
 > index:  A character vector of indices (forBamFile);
> so it's unclear to me what the index character vector should contain.

Tried to clarify that, it's just a character vector containing the path to the 
index file. Generally, the code tries not to care about whether the index file 
is specified with a '.bai' extension, or without.


> Thanks again for this set of class, they're really handy!
> Here's my sessionInfo:
> R Under development (unstable) (2012-10-02 r60861) Platform:
> x86_64-apple-darwin10.8.0 (64-bit)
> locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
> attached base packages: [1] parallel  stats     graphics  grDevices utils
> datasets  methods [8] base
> other attached packages: [1] Rsamtools_1.11.14     Biostrings_2.27.8
> GenomicRanges_1.11.21 [4] IRanges_1.17.24       BiocGenerics_0.5.6
> BiocInstaller_1.9.6
> loaded via a namespace (and not attached): [1] bitops_1.0-5   stats4_2.16.0
> tools_2.16.0   zlibbioc_1.5.0
> Cheers,
> Nico
> --------------------------------------------------------------- Nicolas
> Delhomme
> Genome Biology Computational Support
> European Molecular Biology Laboratory
> Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 -
> Postfach 10.2209 69102 Heidelberg, Germany
> _______________________________________________ Bioc-devel at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the Bioc-devel mailing list