[Bioc-devel] XVector: abstraction

Hervé Pagès hpages at fhcrc.org
Mon Dec 9 18:30:10 CET 2013


On 12/09/2013 05:39 AM, Michael Lawrence wrote:
> Any thoughts about using mmap(), so that SharedRaw and OnDiskRaw just
> operate on a pointer as the abstraction?

Martin mentioned mmap to me for this project but I had some concerns
about Windows compatibility. Are there CRAN or BioC packages that use
it? Would be interesting to have a look at them.

H.

>
> Michael
>
>
> On Sun, Dec 8, 2013 at 11:39 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Michael,
>
>     The OnDiskXRaw virtual class (if this is what you're referring to)
>     is still a very early work-in-progress. The idea is to experiment
>     with on-disk representation of atomic vectors and direct random access
>     to subsequences of the vector. The exact storage mode is implemented by
>     concrete subclasses (currently only DirectRaw and SerializedRaw).
>     OnDiskXRaw is actually analog to SharedRaw except that with the latter
>     the "shared" sequence of bytes resides in memory.
>
>     If we had "on-disk" support for all atomic vectors, it sounds like it
>     would then be easy to support "on-disk" versions of higher-level
>     objects like IRanges or GRanges. They would be defined as their
>     "in-memory" counterpart except that the slots that are atomic vectors
>     in the "in-memory" version would just need to be replaced by "on-disk"
>     atomic vectors. "On-disk" versions of DNAString (and even DNAStringSet)
>     objects could also easily be implemented e.g. by just making the
>     "shared" slot an OnDiskXRaw object instead of a SharedRaw object.
>
>     Putting SharedRaw and OnDiskXRaw under the same umbrella (i.e. under
>     a virtual class) and using that virtual class to specify the slot of
>     higher-level objects like DNAString is tempting but realistically we
>     don't operate on an on-disk object like we do on an in-memory object.
>
>     Having an "on-disk" version of DNAString with direct random access was
>     in fact the initial motivation for OnDiskXRaw. The use case for this
>     was to support direct random access in BSgenome objects without having
>     to change the way the chromosomes are stored on disk (they're stored
>     as serialized raw vectors). I've finally implemented this feature (will
>     soon be pushed to BioC devel) but I changed the storage and didn't use
>     OnDiskXRaw in the end.
>
>     H.
>
>
>
>     On 12/05/2013 06:43 AM, Michael Lawrence wrote:
>
>         A nice goal for the XVector package would be full implementation
>         of the R
>         vector API on top of the already existing memory-sharing (rather
>         than
>         memory-duplicating) data structures. The actual storage mode of
>         the data
>         should be obviously be abstracted, e.g., on-disk should be
>         treated the same
>         as the externalptr representation. Much of the implementation
>         will need to
>         be in C, unless we want to pay the price of extracting things
>         into ordinary
>         R vectors. Should the abstraction be therefore dropped down to
>         the C level,
>         so that the implementations can more easily share from each
>         other? Anything
>         to gain here from the externalVector package?
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list