[Bioc-devel] Base class for interaction data - expressions of interest
Aaron Lun
alun at wehi.edu.au
Mon Nov 16 11:31:33 CET 2015
Thanks for the comment Nadhir.
I had considered the use of a sparse matrix class. The reason I didn't
implement it originally is because truly sparse interaction data would
be better represented by just working with the pairwise format in the
InteractionSet. You need the row/column indices to pass to the
sparseMatrix constructor anyway; a memory-efficient algorithm to do, for
example, compartment identification could just use that directly.
Most existing algorithms for doing this (e.g., k-means/hierarchical
clustering) won't operate natively from a sparseMatrix, and I suspect
they'll just run as.matrix() and convert it to a full matrix. Obviously,
this would defeat the purpose of using a sparse matrix. So, if you have
to rewrite the algorithms anyway, you might as well rewrite them in a
manner that avoids needing the sparseMatrix() as a middleman.
Nonetheless, it's a good point about memory usage. I'll have a think
about it; sparseMatrix() would help a bit, but as coverage increases for
these experiments, the matrix will probably become fairly dense (even if
it's just counts of 1 for some bin pairs). Even now, for compartment
detection, fairly large bins are involved that sparseness usually isn't
observed. Perhaps big.matrix() might be a better choice.
Cheers,
Aaron
On 16/11/15 09:58, DJEKIDEL MOHAMED NADHIR wrote:
> Hi Aaron,
>
> Sounds as a great initiative.
> I just have some comments about the ContactMatrix-Class.
>
> I think with increasing Hi-C resolution the usage of the matrix class
> will consume a lot of memory.
> Maybe using sparseMatrix from the Matrix package has a smaller finger print.
>
> it can also be manipulated in cpp using RcppEigen, if for example you
> plan some functionalities such as AB domains or insulation scores, ... etc.
>
> Regards,
>
> - Nadhir
>
> On Mon, Nov 16, 2015 at 5:33 PM, Aaron Lun <alun at wehi.edu.au
> <mailto:alun at wehi.edu.au>> wrote:
>
> Hello all,
>
> I thought I might give an update on the state of affairs for the
> InteractionSet package. Currently, there's three classes:
>
> - the GInteractions class, inheriting from Vector and intended to
> represent pairwise interactions between genomic regions (based on
> suggestions from Malcolm Perry and Liz Ing-Simmons).
>
> - the InteractionSet class, inheriting from SummarizedExperiment0
> and containing a GInteractions object; intended to store
> experimental data about pairwise interactions (one interaction per row).
>
> - the ContactMatrix class, inheriting from Annotated and storing
> data in matrix form (where rows/columns represent genomic regions).
>
> Getters, setters, conversion methods between classes, distance
> calculation methods and overlap methods have been implemented. Man
> pages and "testthat" scripts have also been written. Still missing a
> vignette, though it should be easy enough to write one.
>
> All in all, I think it's a solid first draft. Any comments would be
> appreciated.
>
> Cheers,
>
> Aaron
>
> On 08/11/15 19:31, Aaron Lun wrote:
>
> Okay, some meat and bones are on GitHub now:
>
> https://github.com/LTLA/InteractionSet
>
> The idea is to represent genomic interactions as pairs of genomic
> regions, using indices to point to a common GRanges object (a la
> Hits,
> though I haven't used that explicitly due to the presence of
> additional
> constraints on the indices). Data for each interaction is stored
> using a
> SummarizedExperiment framework (one row per interaction).
>
> With regards to the methods, most of the low-hanging fruit has been
> implemented, courtesy of inheriting from SummarizedExperiment0.
> I'll add
> proper unit tests over the coming week. It currently passes
> through R
> CMD check okay, except for a warning about ":::" in the cbind/rbind
> definitions (callNextMethod() didn't seem to work inside those
> methods,
> and I didn't want to rewrite the SE0 'binding methods).
>
> Any thoughts appreciated.
>
> - Aaron
>
> On 07/11/15 19:33, Morgan, Martin wrote:
>
> Just to say that this is a great idea. If this starts as a
> github
> package (or in svn, we can create a location for you if
> you'd like) I
> and others would I am sure be happy to try to provide any
> guidance /
> insight. The main design principles are probably to reuse as
> much as
> possible from existing classes, especially the S4Vectors /
> GRanges
> world, and to integrate metadata as appropriate (like
> SummarizedExepriment, for instance).
>
> Martin
> ________________________________________
> From: Bioc-devel [bioc-devel-bounces at r-project.org
> <mailto:bioc-devel-bounces at r-project.org>] on behalf of Aaron
> Lun [alun at wehi.edu.au <mailto:alun at wehi.edu.au>]
> Sent: Thursday, November 05, 2015 12:27 PM
> To: bioc-devel at r-project.org <mailto:bioc-devel at r-project.org>
> Subject: Re: [Bioc-devel] Base class for interaction data -
> expressions of interest
>
> There's a growing number of Bioconductor packages dealing with
> interaction data; diffHic, GenomicInteractions, HiTC, to
> name a few (and
> probably more in the future). Each of these packages defines
> its own
> class to store interaction data - DIList for diffHic,
> GenomicInteractions for GenomicInteractions, and HTClist for
> HiTC.
>
> These classes seem to share a lot of features, which
> suggests that they
> can be (easily?) replaced with a common class. This would
> have two
> advantages - one, developers of new and existing packages
> don't have to
> continually write and maintain new classes; and two, it
> provides users
> with a consistent user experience across the relevant packages.
>
> My question is, does anybody have anything in the pipeline
> with respect
> to a base package for an interaction class? If not, I'm
> planning to put
> something together for the next BioC release. To this end,
> I'd welcome
> any ideas/input/code; the aim is to make a drop-in
> replacement (insofar
> as that's possible) for the existing classes in each package.
>
> Cheers,
>
> Aaron
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally privileged and/or
> confidential
> information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this
> message to the
> intended recipient(s), you are hereby notified that any
> disclosure,
> copying, distribution, or use of this email message is
> prohibited. If
> you have received this message in error, please notify the
> sender
> immediately by e-mail and delete this email message from your
> computer. Thank you.
>
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
More information about the Bioc-devel
mailing list