On Tue, Mar 11, 2014 at 8:57 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote:

> Hi,
>
>
> On 03/11/14 15:33, Hervé Pagès wrote:
>
>> On 03/11/2014 02:52 PM, Hervé Pagès wrote:
>>
>>> On 03/11/2014 09:57 AM, Valerie Obenchain wrote:
>>>
>>>> Hi Herve,
>>>>
>>>> On 03/10/2014 10:31 PM, Hervé Pagès wrote:
>>>>
>>>>> Hi Val,
>>>>>
>>>>> I think it would help understand the motivations behind this proposal
>>>>> if you could give an example of a method where the user cannot supply
>>>>> a file name but has to create a 'File' (or 'FileList') object first.
>>>>> And how the file registry proposal below would help.
>>>>> It looks like you have such an example in the GenomicFileViews package.
>>>>> Do you think you could give more details?
>>>>>
>>>>
>>>> The most recent motivating use case was in creating subclasses of
>>>> GenomicFileViews objects (BamFileViews, BigWigFileViews, etc.) We wanted
>>>> to have a general constructor, something like GenomicFileViews(), that
>>>> would create the appropriate subclass. However to create the correct
>>>> subclass we needed to know if the files were bam, bw, fasta etc.
>>>> Recognition of the file type by extension would allow us to do this with
>>>> no further input from the user.
>>>>
>>>
>>> That helps, thanks!
>>>
>>> Having this kind of general constructor sounds like it could indeed be
>>> useful. Would be an opportunity to put all these *File classes (the 22
>>> RTLFile subclasses defined in rtracklayer and the 5 RsamtoolsFile
>>> subclasses defined in Rsamtools) under the same umbrella (i.e. a parent
>>> virtual class) and use the name of this virtual class (e.g. File) for
>>> the general constructor.
>>>
>>> Allowing a registration mechanism to extend the knowledge of this File()
>>> constructor is an implementation detail. I don't see a lot of benefit to
>>> it. Only a package that implements a concrete File subclass would
>>> actually need to register the new subclass. Sounds easy enough to ask
>>> to whoever has commit access to the File() code to modify it. This
>>> kind of update might also require adding the name of the package where
>>> the new File subclass is implemented to the Depends/Imports/Suggests
>>> of the package where File() lives, which is something that cannot be
>>> done via a registration mechanism.
>>>
>>
>> This clean-up of the *File jungle would also be a good opportunity to:
>>
>>    - Choose what we want to do with reference classes: use them for all
>>      the *File classes or for none of them. (Right now, those defined
>>      in Rsamtools are reference classes, and those defined in
>>      rtracklayer are not.)
>>
>>    - Move the I/O functionality currently in rtracklayer to a
>>      separate package. Based on the number of contributed packages I
>>      reviewed so far that were trying to reinvent the wheel because
>>      they had no idea that the I/O function they needed was actually
>>      in rtracklayer, I'd like to advocate for using a package name
>>      that makes it very clear that it's all about I/O.
>>
>
> Thanks for the suggestions. This re-org sounds good to me. As you say,
> unifying the *File classes in a single package would make them more visible
> to other developers and enforce consistent behavior.
>
> If you aren't in favor of a registration mechanism for 'discovery' how
> should a function with methods for many *File classes (e.g., import())
> handle a character file name? import() uses FileForFormat() to discover the
> file type, makes the *File class and dispatches to the appropriate *File
> method. The registry was an attempt at generalizing this concept.
>
>
Honestly, FileForFormat works now, and while it could certainly be
improved, perhaps there are bigger problems to solve?


> What do you think about the use of a registry for Vince's idea of holding
> a digest/path reference to large data but not installing it until it's
> used? Other ideas of how / where this could be stored?
>
>
I think that's an orthogonal problem, but a more important one.


> Val
>
>
>
>
>> H.
>>
>>
>>
>>> H.
>>>
>>>
>>>
>>>> Val
>>>>
>>>>
>>>>> Thanks,
>>>>> H.
>>>>>
>>>>>
>>>>> On 03/10/2014 08:46 PM, Valerie Obenchain wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm soliciting feedback on the idea of a general file 'registry' that
>>>>>> would identify file types by their extensions. This is similar in
>>>>>> spirit
>>>>>> to FileForformat() in rtracklayer but a more general abstraction that
>>>>>> could be used across packages. The goal is to allow a user to supply
>>>>>> only file name(s) to a method instead of first creating a 'File' class
>>>>>> such as BamFile, FaFile, BigWigFile etc.
>>>>>>
>>>>>> A first attempt at this is in the GenomicFileViews package
>>>>>> (https://github.com/Bioconductor/GenomicFileViews). A registry
>>>>>> (lookup)
>>>>>> is created as an environment at load time:
>>>>>>
>>>>>> .fileTypeRegistry <- new.env(parent=emptyenv()
>>>>>>
>>>>>> Files are registered with an information triplet consisting of class,
>>>>>> package and regular expression to identify the extension. In
>>>>>> GenomicFileViews we register FaFileList, BamFileList and
>>>>>> BigWigFileList
>>>>>> but any 'File' class can be registered that has a constructor of the
>>>>>> same name.
>>>>>>
>>>>>> .onLoad <- function(libname, pkgname)
>>>>>> {
>>>>>>      registerFileType("FaFileList", "Rsamtools", "\\.fa$")
>>>>>>      registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
>>>>>>      registerFileType("BamFileList", "Rsamtools", "\\.bam$")
>>>>>>      registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
>>>>>> }
>>>>>>
>>>>>> The makeFileType() helper creates the appropriate class. This function
>>>>>> is used behind the scenes to do the lookup and coerce to the correct
>>>>>> 'File' class.
>>>>>>
>>>>>>  > makeFileType(c("foo.bam", "bar.bam"))
>>>>>> BamFileList of length 2
>>>>>> names(2): foo.bam bar.bam
>>>>>>
>>>>>> New types can be added at any time with registerFileType():
>>>>>>
>>>>>> registerFileType(NewClass, NewPackage, "\\.NewExtension$")
>>>>>>
>>>>>>
>>>>>> Thoughts:
>>>>>>
>>>>>> (1) If this sounds generally useful where should it live? rtracklayer,
>>>>>> GenomicFileViews or other? Alternatively it could be its own
>>>>>> lightweight
>>>>>> package (FileRegister) that creates the registry and provides the
>>>>>> helpers. It would be up to the package authors that depend on
>>>>>> FileRegister to register their own files types at load time.
>>>>>>
>>>>>> (2) To avoid potential ambiguities maybe searching should be by regex
>>>>>> and package name. Still a work in progress.
>>>>>>
>>>>>>
>>>>>> Valerie
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

	[[alternative HTML version deleted]]

