[Bioc-devel] XVector: abstraction
Hervé Pagès
hpages at fhcrc.org
Mon Dec 9 18:30:10 CET 2013
On 12/09/2013 05:39 AM, Michael Lawrence wrote:
> Any thoughts about using mmap(), so that SharedRaw and OnDiskRaw just
> operate on a pointer as the abstraction?
Martin mentioned mmap to me for this project but I had some concerns
about Windows compatibility. Are there CRAN or BioC packages that use
it? Would be interesting to have a look at them.
H.
>
> Michael
>
>
> On Sun, Dec 8, 2013 at 11:39 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Michael,
>
> The OnDiskXRaw virtual class (if this is what you're referring to)
> is still a very early work-in-progress. The idea is to experiment
> with on-disk representation of atomic vectors and direct random access
> to subsequences of the vector. The exact storage mode is implemented by
> concrete subclasses (currently only DirectRaw and SerializedRaw).
> OnDiskXRaw is actually analog to SharedRaw except that with the latter
> the "shared" sequence of bytes resides in memory.
>
> If we had "on-disk" support for all atomic vectors, it sounds like it
> would then be easy to support "on-disk" versions of higher-level
> objects like IRanges or GRanges. They would be defined as their
> "in-memory" counterpart except that the slots that are atomic vectors
> in the "in-memory" version would just need to be replaced by "on-disk"
> atomic vectors. "On-disk" versions of DNAString (and even DNAStringSet)
> objects could also easily be implemented e.g. by just making the
> "shared" slot an OnDiskXRaw object instead of a SharedRaw object.
>
> Putting SharedRaw and OnDiskXRaw under the same umbrella (i.e. under
> a virtual class) and using that virtual class to specify the slot of
> higher-level objects like DNAString is tempting but realistically we
> don't operate on an on-disk object like we do on an in-memory object.
>
> Having an "on-disk" version of DNAString with direct random access was
> in fact the initial motivation for OnDiskXRaw. The use case for this
> was to support direct random access in BSgenome objects without having
> to change the way the chromosomes are stored on disk (they're stored
> as serialized raw vectors). I've finally implemented this feature (will
> soon be pushed to BioC devel) but I changed the storage and didn't use
> OnDiskXRaw in the end.
>
> H.
>
>
>
> On 12/05/2013 06:43 AM, Michael Lawrence wrote:
>
> A nice goal for the XVector package would be full implementation
> of the R
> vector API on top of the already existing memory-sharing (rather
> than
> memory-duplicating) data structures. The actual storage mode of
> the data
> should be obviously be abstracted, e.g., on-disk should be
> treated the same
> as the externalptr representation. Much of the implementation
> will need to
> be in C, unless we want to pay the price of extracting things
> into ordinary
> R vectors. Should the abstraction be therefore dropped down to
> the C level,
> so that the implementations can more easily share from each
> other? Anything
> to gain here from the externalVector package?
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list