[Bioc-devel] file registry - feedback
Valerie Obenchain
vobencha at fhcrc.org
Tue Mar 25 21:18:27 CET 2014
Hi,
This discussion went off-line and I wanted to give a summary of what we
decided to go with.
We'll create a new package, BiocFile, that has a minimal API.
API:
- 'File' class (virtual, reference class) and constructor
- close / open / isOpen
- import / export
- file registry
We won't require existing *File classes to implement yield but would
'recommend' that new *File classes do. By getting this structure in
place we can guide future *File developments in a consistent direction
even if we can't harmonize all current classes. I'll start work on this
after the release.
Thanks again for the input.
Valerie
On 03/11/2014 10:23 PM, Michael Lawrence wrote:
>
>
>
> On Tue, Mar 11, 2014 at 3:33 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> On 03/11/2014 02:52 PM, Hervé Pagès wrote:
>
> On 03/11/2014 09:57 AM, Valerie Obenchain wrote:
>
> Hi Herve,
>
> On 03/10/2014 10:31 PM, Hervé Pagès wrote:
>
> Hi Val,
>
> I think it would help understand the motivations behind
> this proposal
> if you could give an example of a method where the user
> cannot supply
> a file name but has to create a 'File' (or 'FileList')
> object first.
> And how the file registry proposal below would help.
> It looks like you have such an example in the
> GenomicFileViews package.
> Do you think you could give more details?
>
>
> The most recent motivating use case was in creating
> subclasses of
> GenomicFileViews objects (BamFileViews, BigWigFileViews,
> etc.) We wanted
> to have a general constructor, something like
> GenomicFileViews(), that
> would create the appropriate subclass. However to create the
> correct
> subclass we needed to know if the files were bam, bw, fasta etc.
> Recognition of the file type by extension would allow us to
> do this with
> no further input from the user.
>
>
> That helps, thanks!
>
> Having this kind of general constructor sounds like it could
> indeed be
> useful. Would be an opportunity to put all these *File classes
> (the 22
> RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile
> subclasses defined in Rsamtools) under the same umbrella (i.e. a
> parent
> virtual class) and use the name of this virtual class (e.g.
> File) for
> the general constructor.
>
> Allowing a registration mechanism to extend the knowledge of
> this File()
> constructor is an implementation detail. I don't see a lot of
> benefit to
> it. Only a package that implements a concrete File subclass would
> actually need to register the new subclass. Sounds easy enough
> to ask
> to whoever has commit access to the File() code to modify it. This
> kind of update might also require adding the name of the package
> where
> the new File subclass is implemented to the Depends/Imports/Suggests
> of the package where File() lives, which is something that cannot be
> done via a registration mechanism.
>
>
> This clean-up of the *File jungle would also be a good opportunity to:
>
> - Choose what we want to do with reference classes: use them for all
> the *File classes or for none of them. (Right now, those defined
> in Rsamtools are reference classes, and those defined in
> rtracklayer are not.)
>
> - Move the I/O functionality currently in rtracklayer to a
> separate package. Based on the number of contributed packages I
> reviewed so far that were trying to reinvent the wheel because
> they had no idea that the I/O function they needed was actually
> in rtracklayer, I'd like to advocate for using a package name
> that makes it very clear that it's all about I/O.
>
>
>
> I can see some benefit in renaming/reorganizing, but if they weren't
> able to perform a simple google search for functionality, I don't think
> the name of the package was the problem. "read gff bioconductor" returns
> rtracklayer as the top hit.
>
>
> H.
>
>
>
> H.
>
>
>
> Val
>
>
> Thanks,
> H.
>
>
> On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
>
> Hi all,
>
> I'm soliciting feedback on the idea of a general
> file 'registry' that
> would identify file types by their extensions. This
> is similar in
> spirit
> to FileForformat() in rtracklayer but a more general
> abstraction that
> could be used across packages. The goal is to allow
> a user to supply
> only file name(s) to a method instead of first
> creating a 'File' class
> such as BamFile, FaFile, BigWigFile etc.
>
> A first attempt at this is in the GenomicFileViews
> package
> (https://github.com/__Bioconductor/GenomicFileViews
> <https://github.com/Bioconductor/GenomicFileViews>)__.
> A registry (lookup)
> is created as an environment at load time:
>
> .fileTypeRegistry <- new.env(parent=emptyenv()
>
> Files are registered with an information triplet
> consisting of class,
> package and regular expression to identify the
> extension. In
> GenomicFileViews we register FaFileList, BamFileList
> and BigWigFileList
> but any 'File' class can be registered that has a
> constructor of the
> same name.
>
> .onLoad <- function(libname, pkgname)
> {
> registerFileType("FaFileList", "Rsamtools",
> "\\.fa$")
> registerFileType("FaFileList", "Rsamtools",
> "\\.fasta$")
> registerFileType("BamFileList"__, "Rsamtools",
> "\\.bam$")
> registerFileType("__BigWigFileList",
> "rtracklayer", "\\.bw$")
> }
>
> The makeFileType() helper creates the appropriate
> class. This function
> is used behind the scenes to do the lookup and
> coerce to the correct
> 'File' class.
>
> > makeFileType(c("foo.bam", "bar.bam"))
> BamFileList of length 2
> names(2): foo.bam bar.bam
>
> New types can be added at any time with
> registerFileType():
>
> registerFileType(NewClass, NewPackage,
> "\\.NewExtension$")
>
>
> Thoughts:
>
> (1) If this sounds generally useful where should it
> live? rtracklayer,
> GenomicFileViews or other? Alternatively it could be
> its own
> lightweight
> package (FileRegister) that creates the registry and
> provides the
> helpers. It would be up to the package authors that
> depend on
> FileRegister to register their own files types at
> load time.
>
> (2) To avoid potential ambiguities maybe searching
> should be by regex
> and package name. Still a work in progress.
>
>
> Valerie
>
> _________________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: vobencha at fhcrc.org
Phone: (206) 667-3158
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list