[Bioc-devel] file registry - feedback
Valerie Obenchain
vobencha at fhcrc.org
Tue Mar 11 04:46:33 CET 2014
Hi all,
I'm soliciting feedback on the idea of a general file 'registry' that
would identify file types by their extensions. This is similar in spirit
to FileForformat() in rtracklayer but a more general abstraction that
could be used across packages. The goal is to allow a user to supply
only file name(s) to a method instead of first creating a 'File' class
such as BamFile, FaFile, BigWigFile etc.
A first attempt at this is in the GenomicFileViews package
(https://github.com/Bioconductor/GenomicFileViews). A registry (lookup)
is created as an environment at load time:
.fileTypeRegistry <- new.env(parent=emptyenv()
Files are registered with an information triplet consisting of class,
package and regular expression to identify the extension. In
GenomicFileViews we register FaFileList, BamFileList and BigWigFileList
but any 'File' class can be registered that has a constructor of the
same name.
.onLoad <- function(libname, pkgname)
{
registerFileType("FaFileList", "Rsamtools", "\\.fa$")
registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
registerFileType("BamFileList", "Rsamtools", "\\.bam$")
registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
}
The makeFileType() helper creates the appropriate class. This function
is used behind the scenes to do the lookup and coerce to the correct
'File' class.
> makeFileType(c("foo.bam", "bar.bam"))
BamFileList of length 2
names(2): foo.bam bar.bam
New types can be added at any time with registerFileType():
registerFileType(NewClass, NewPackage, "\\.NewExtension$")
Thoughts:
(1) If this sounds generally useful where should it live? rtracklayer,
GenomicFileViews or other? Alternatively it could be its own lightweight
package (FileRegister) that creates the registry and provides the
helpers. It would be up to the package authors that depend on
FileRegister to register their own files types at load time.
(2) To avoid potential ambiguities maybe searching should be by regex
and package name. Still a work in progress.
Valerie
More information about the Bioc-devel
mailing list