[Bioc-devel] file registry - feedback

Valerie Obenchain vobencha at fhcrc.org
Tue Mar 11 17:57:30 CET 2014


Hi Herve,

On 03/10/2014 10:31 PM, Hervé Pagès wrote:
> Hi Val,
>
> I think it would help understand the motivations behind this proposal
> if you could give an example of a method where the user cannot supply
> a file name but has to create a 'File' (or 'FileList') object first.
> And how the file registry proposal below would help.
> It looks like you have such an example in the GenomicFileViews package.
> Do you think you could give more details?

The most recent motivating use case was in creating subclasses of 
GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted 
to have a general constructor, something like GenomicFileViews(), that 
would create the appropriate subclass. However to create the correct 
subclass we needed to know if the files were bam, bw, fasta etc. 
Recognition of the file type by extension would allow us to do this with 
no further input from the user.

Val

>
> Thanks,
> H.
>
>
> On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
>> Hi all,
>>
>> I'm soliciting feedback on the idea of a general file 'registry' that
>> would identify file types by their extensions. This is similar in spirit
>> to FileForformat() in rtracklayer but a more general abstraction that
>> could be used across packages. The goal is to allow a user to supply
>> only file name(s) to a method instead of first creating a 'File' class
>> such as BamFile, FaFile, BigWigFile etc.
>>
>> A first attempt at this is in the GenomicFileViews package
>> (https://github.com/Bioconductor/GenomicFileViews). A registry (lookup)
>> is created as an environment at load time:
>>
>> .fileTypeRegistry <- new.env(parent=emptyenv()
>>
>> Files are registered with an information triplet consisting of class,
>> package and regular expression to identify the extension. In
>> GenomicFileViews we register FaFileList, BamFileList and BigWigFileList
>> but any 'File' class can be registered that has a constructor of the
>> same name.
>>
>> .onLoad <- function(libname, pkgname)
>> {
>>      registerFileType("FaFileList", "Rsamtools", "\\.fa$")
>>      registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
>>      registerFileType("BamFileList", "Rsamtools", "\\.bam$")
>>      registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
>> }
>>
>> The makeFileType() helper creates the appropriate class. This function
>> is used behind the scenes to do the lookup and coerce to the correct
>> 'File' class.
>>
>>  > makeFileType(c("foo.bam", "bar.bam"))
>> BamFileList of length 2
>> names(2): foo.bam bar.bam
>>
>> New types can be added at any time with registerFileType():
>>
>> registerFileType(NewClass, NewPackage, "\\.NewExtension$")
>>
>>
>> Thoughts:
>>
>> (1) If this sounds generally useful where should it live? rtracklayer,
>> GenomicFileViews or other? Alternatively it could be its own lightweight
>> package (FileRegister) that creates the registry and provides the
>> helpers. It would be up to the package authors that depend on
>> FileRegister to register their own files types at load time.
>>
>> (2) To avoid potential ambiguities maybe searching should be by regex
>> and package name. Still a work in progress.
>>
>>
>> Valerie
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list