[Bioc-devel] file registry - feedback

Valerie Obenchain vobencha at fhcrc.org
Tue Mar 11 04:46:33 CET 2014


Hi all,

I'm soliciting feedback on the idea of a general file 'registry' that 
would identify file types by their extensions. This is similar in spirit 
to FileForformat() in rtracklayer but a more general abstraction that 
could be used across packages. The goal is to allow a user to supply 
only file name(s) to a method instead of first creating a 'File' class 
such as BamFile, FaFile, BigWigFile etc.

A first attempt at this is in the GenomicFileViews package 
(https://github.com/Bioconductor/GenomicFileViews). A registry (lookup) 
is created as an environment at load time:

.fileTypeRegistry <- new.env(parent=emptyenv()

Files are registered with an information triplet consisting of class, 
package and regular expression to identify the extension. In 
GenomicFileViews we register FaFileList, BamFileList and BigWigFileList 
but any 'File' class can be registered that has a constructor of the 
same name.

.onLoad <- function(libname, pkgname)
{
     registerFileType("FaFileList", "Rsamtools", "\\.fa$")
     registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
     registerFileType("BamFileList", "Rsamtools", "\\.bam$")
     registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
}

The makeFileType() helper creates the appropriate class. This function 
is used behind the scenes to do the lookup and coerce to the correct 
'File' class.

 > makeFileType(c("foo.bam", "bar.bam"))
BamFileList of length 2
names(2): foo.bam bar.bam

New types can be added at any time with registerFileType():

registerFileType(NewClass, NewPackage, "\\.NewExtension$")


Thoughts:

(1) If this sounds generally useful where should it live? rtracklayer, 
GenomicFileViews or other? Alternatively it could be its own lightweight 
package (FileRegister) that creates the registry and provides the 
helpers. It would be up to the package authors that depend on 
FileRegister to register their own files types at load time.

(2) To avoid potential ambiguities maybe searching should be by regex 
and package name. Still a work in progress.


Valerie



More information about the Bioc-devel mailing list