[Bioc-devel] Base class for interaction data - expressions of interest

Aaron Lun alun at wehi.edu.au
Mon Nov 16 11:31:33 CET 2015

Thanks for the comment Nadhir.

I had considered the use of a sparse matrix class. The reason I didn't 
implement it originally is because truly sparse interaction data would 
be better represented by just working with the pairwise format in the 
InteractionSet. You need the row/column indices to pass to the 
sparseMatrix constructor anyway; a memory-efficient algorithm to do, for 
example, compartment identification could just use that directly.

Most existing algorithms for doing this (e.g., k-means/hierarchical 
clustering) won't operate natively from a sparseMatrix, and I suspect 
they'll just run as.matrix() and convert it to a full matrix. Obviously, 
this would defeat the purpose of using a sparse matrix. So, if you have 
to rewrite the algorithms anyway, you might as well rewrite them in a 
manner that avoids needing the sparseMatrix() as a middleman.

Nonetheless, it's a good point about memory usage. I'll have a think 
about it; sparseMatrix() would help a bit, but as coverage increases for 
these experiments, the matrix will probably become fairly dense (even if 
it's just counts of 1 for some bin pairs). Even now, for compartment 
detection, fairly large bins are involved that sparseness usually isn't 
observed. Perhaps big.matrix() might be a better choice.



On 16/11/15 09:58, DJEKIDEL MOHAMED NADHIR wrote:
> Hi Aaron,
> Sounds as a great initiative.
> I just have some comments about the ContactMatrix-Class.
> I think with increasing Hi-C resolution the usage of the matrix class
> will consume a lot of memory.
> Maybe using sparseMatrix from the Matrix package has a smaller finger print.
> it can also be manipulated in cpp using  RcppEigen, if for example you
> plan some functionalities such as AB domains or insulation scores, ... etc.
> Regards,
> - Nadhir
> On Mon, Nov 16, 2015 at 5:33 PM, Aaron Lun <alun at wehi.edu.au
> <mailto:alun at wehi.edu.au>> wrote:
>     Hello all,
>     I thought I might give an update on the state of affairs for the
>     InteractionSet package. Currently, there's three classes:
>     - the GInteractions class, inheriting from Vector and intended to
>     represent pairwise interactions between genomic regions (based on
>     suggestions from Malcolm Perry and Liz Ing-Simmons).
>     - the InteractionSet class, inheriting from SummarizedExperiment0
>     and containing a GInteractions object; intended to store
>     experimental data about pairwise interactions (one interaction per row).
>     - the ContactMatrix class, inheriting from Annotated and storing
>     data in matrix form (where rows/columns represent genomic regions).
>     Getters, setters, conversion methods between classes, distance
>     calculation methods and overlap methods have been implemented. Man
>     pages and "testthat" scripts have also been written. Still missing a
>     vignette, though it should be easy enough to write one.
>     All in all, I think it's a solid first draft. Any comments would be
>     appreciated.
>     Cheers,
>     Aaron
>     On 08/11/15 19:31, Aaron Lun wrote:
>         Okay, some meat and bones are on GitHub now:
>         https://github.com/LTLA/InteractionSet
>         The idea is to represent genomic interactions as pairs of genomic
>         regions, using indices to point to a common GRanges object (a la
>         Hits,
>         though I haven't used that explicitly due to the presence of
>         additional
>         constraints on the indices). Data for each interaction is stored
>         using a
>         SummarizedExperiment framework (one row per interaction).
>         With regards to the methods, most of the low-hanging fruit has been
>         implemented, courtesy of inheriting from SummarizedExperiment0.
>         I'll add
>         proper unit tests over the coming week. It currently passes
>         through R
>         CMD check okay, except for a warning about ":::" in the cbind/rbind
>         definitions (callNextMethod() didn't seem to work inside those
>         methods,
>         and I didn't want to rewrite the SE0 'binding methods).
>         Any thoughts appreciated.
>         - Aaron
>         On 07/11/15 19:33, Morgan, Martin wrote:
>             Just to say that this is a great idea. If this starts as a
>             github
>             package (or in svn, we can create a location for you if
>             you'd like) I
>             and others would I am sure be happy to try to provide any
>             guidance /
>             insight. The main design principles are probably to reuse as
>             much as
>             possible from existing classes, especially the S4Vectors /
>             GRanges
>             world, and to integrate metadata as appropriate (like
>             SummarizedExepriment, for instance).
>             Martin
>             ________________________________________
>             From: Bioc-devel [bioc-devel-bounces at r-project.org
>             <mailto:bioc-devel-bounces at r-project.org>] on behalf of Aaron
>             Lun [alun at wehi.edu.au <mailto:alun at wehi.edu.au>]
>             Sent: Thursday, November 05, 2015 12:27 PM
>             To: bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
>             Subject: Re: [Bioc-devel] Base class for interaction data -
>             expressions of      interest
>             There's a growing number of Bioconductor packages dealing with
>             interaction data; diffHic, GenomicInteractions, HiTC, to
>             name a few (and
>             probably more in the future). Each of these packages defines
>             its own
>             class to store interaction data - DIList for diffHic,
>             GenomicInteractions for GenomicInteractions, and HTClist for
>             HiTC.
>             These classes seem to share a lot of features, which
>             suggests that they
>             can be (easily?) replaced with a common class. This would
>             have two
>             advantages - one, developers of new and existing packages
>             don't have to
>             continually write and maintain new classes; and two, it
>             provides users
>             with a consistent user experience across the relevant packages.
>             My question is, does anybody have anything in the pipeline
>             with respect
>             to a base package for an interaction class? If not, I'm
>             planning to put
>             something together for the next BioC release. To this end,
>             I'd welcome
>             any ideas/input/code; the aim is to make a drop-in
>             replacement (insofar
>             as that's possible) for the existing classes in each package.
>             Cheers,
>             Aaron
>             _______________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             mailing list
>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>             This email message may contain legally privileged and/or
>             confidential
>             information.  If you are not the intended recipient(s), or the
>             employee or agent responsible for the delivery of this
>             message to the
>             intended recipient(s), you are hereby notified that any
>             disclosure,
>             copying, distribution, or use of this email message is
>             prohibited.  If
>             you have received this message in error, please notify the
>             sender
>             immediately by e-mail and delete this email message from your
>             computer. Thank you.
>         _______________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>     _______________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list