[Bioc-devel] file registry - feedback
hpages at fhcrc.org
Tue Mar 11 22:52:32 CET 2014
On 03/11/2014 09:57 AM, Valerie Obenchain wrote:
> Hi Herve,
> On 03/10/2014 10:31 PM, Hervé Pagès wrote:
>> Hi Val,
>> I think it would help understand the motivations behind this proposal
>> if you could give an example of a method where the user cannot supply
>> a file name but has to create a 'File' (or 'FileList') object first.
>> And how the file registry proposal below would help.
>> It looks like you have such an example in the GenomicFileViews package.
>> Do you think you could give more details?
> The most recent motivating use case was in creating subclasses of
> GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted
> to have a general constructor, something like GenomicFileViews(), that
> would create the appropriate subclass. However to create the correct
> subclass we needed to know if the files were bam, bw, fasta etc.
> Recognition of the file type by extension would allow us to do this with
> no further input from the user.
That helps, thanks!
Having this kind of general constructor sounds like it could indeed be
useful. Would be an opportunity to put all these *File classes (the 22
RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile
subclasses defined in Rsamtools) under the same umbrella (i.e. a parent
virtual class) and use the name of this virtual class (e.g. File) for
the general constructor.
Allowing a registration mechanism to extend the knowledge of this File()
constructor is an implementation detail. I don't see a lot of benefit to
it. Only a package that implements a concrete File subclass would
actually need to register the new subclass. Sounds easy enough to ask
to whoever has commit access to the File() code to modify it. This
kind of update might also require adding the name of the package where
the new File subclass is implemented to the Depends/Imports/Suggests
of the package where File() lives, which is something that cannot be
done via a registration mechanism.
>> On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
>>> Hi all,
>>> I'm soliciting feedback on the idea of a general file 'registry' that
>>> would identify file types by their extensions. This is similar in spirit
>>> to FileForformat() in rtracklayer but a more general abstraction that
>>> could be used across packages. The goal is to allow a user to supply
>>> only file name(s) to a method instead of first creating a 'File' class
>>> such as BamFile, FaFile, BigWigFile etc.
>>> A first attempt at this is in the GenomicFileViews package
>>> (https://github.com/Bioconductor/GenomicFileViews). A registry (lookup)
>>> is created as an environment at load time:
>>> .fileTypeRegistry <- new.env(parent=emptyenv()
>>> Files are registered with an information triplet consisting of class,
>>> package and regular expression to identify the extension. In
>>> GenomicFileViews we register FaFileList, BamFileList and BigWigFileList
>>> but any 'File' class can be registered that has a constructor of the
>>> same name.
>>> .onLoad <- function(libname, pkgname)
>>> registerFileType("FaFileList", "Rsamtools", "\\.fa$")
>>> registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
>>> registerFileType("BamFileList", "Rsamtools", "\\.bam$")
>>> registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
>>> The makeFileType() helper creates the appropriate class. This function
>>> is used behind the scenes to do the lookup and coerce to the correct
>>> 'File' class.
>>> > makeFileType(c("foo.bam", "bar.bam"))
>>> BamFileList of length 2
>>> names(2): foo.bam bar.bam
>>> New types can be added at any time with registerFileType():
>>> registerFileType(NewClass, NewPackage, "\\.NewExtension$")
>>> (1) If this sounds generally useful where should it live? rtracklayer,
>>> GenomicFileViews or other? Alternatively it could be its own lightweight
>>> package (FileRegister) that creates the registry and provides the
>>> helpers. It would be up to the package authors that depend on
>>> FileRegister to register their own files types at load time.
>>> (2) To avoid potential ambiguities maybe searching should be by regex
>>> and package name. Still a work in progress.
>>> Bioc-devel at r-project.org mailing list
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel