[Bioc-devel] BamFile validation
    Hervé Pagès 
    hpages at fhcrc.org
       
    Tue Jan  8 08:27:27 CET 2013
    
    
  
Hi there,
FWIW system.file() has the 'mustWork' arg for this. Strange name though
since the man page suggests it only tests for existence, not that the
file can actually be open for reading.
H.
On 01/07/2013 11:11 PM, Nicolas Delhomme wrote:
> Just to clarify. I don't mean it needs to validate the BAM file (i.e. checking that it's properly formatted), so using file.exists on the provided file paths would be sufficient.
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On 8 Jan 2013, at 02:47, Ryan Thompson wrote:
>
>> Couldn't one test for existence by trying to open the BamFile object, and possibly read one sequence (or maybe just read the header since I guess a valid bam file can contain zero sequences)?
>>
>> On Jan 7, 2013 1:32 PM, "Henrik Bengtsson" <hb at biostat.ucsf.edu> wrote:
>> On Mon, Jan 7, 2013 at 12:32 PM, Nicolas Delhomme <delhomme at embl.de> wrote:
>>> Hi Martin, Marc,
>>>
>>> I'm now implementing the use of BamFile objects in easyRNASeq and I like them. I think it would be very useful if when constructing a BamFile the existence of the path and index could be tested; i.e. this works: BamFile("test.bam","test.bam.bai") although these files do not exist. Is there a reason that this validation is not done? If there is, could a validation parameter be added (set to FALSE by default to keep the current behavior) that would check for the files' existence?
>>
>> Good idea - I propose argument 'mustExist'.
>>
>> My $0.02
>>
>> /Henrik
>>
>>> The same goes for the yieldSize argument, i.e. this works BamFile("test.bam","test.bam.bai",yieldSize=-1), although I'm not sure what a -1 yieldSize means. I can of course do these validations within easyRNASeq, but anyone else building packages on top of BamFile would probably want to do the same...
>>>
>>>
>>> A related point unclear at the moment in the documentation is what the index filename should be: i.e. scanBam expects as the index the same value as for the bam filename (that assumes the user has not renamed his bam.bai file  and you never know what users might be doing... :-S ... ) but the BamFile Rd page says:
>>>
>>> file: A character vector of BAM file paths
>>> index:  A character vector of indices (forBamFile);
>>>
>>> so it's unclear to me what the index character vector should contain.
>>>
>>> Thanks again for this set of class, they're really handy!
>>>
>>> Here's my sessionInfo:
>>>
>>> R Under development (unstable) (2012-10-02 r60861)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] Rsamtools_1.11.14     Biostrings_2.27.8     GenomicRanges_1.11.21
>>> [4] IRanges_1.17.24       BiocGenerics_0.5.6    BiocInstaller_1.9.6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-5   stats4_2.16.0  tools_2.16.0   zlibbioc_1.5.0
>>>
>>> Cheers,
>>>
>>> Nico
>>>
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> Genome Biology Computational Support
>>>
>>> European Molecular Biology Laboratory
>>>
>>> Tel: +49 6221 387 8310
>>> Email: nicolas.delhomme at embl.de
>>> Meyerhofstrasse 1 - Postfach 10.2209
>>> 69102 Heidelberg, Germany
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
-- 
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
    
    
More information about the Bioc-devel
mailing list