[Bioc-devel] pointer and big matrix in R

Martin Morgan mtmorgan at fhcrc.org
Fri Apr 12 19:44:55 CEST 2013


On 04/12/2013 10:02 AM, Servant Nicolas wrote:
> Dear all,
>
> I have a S4 object (HTCexp from HITC package), composed of one big matrix, and two genomicRanges objects, A and B which describe the matrix raws and columns.
> I thinking about a way to decrease the memory size of this object.
> I also have methods to  get/set the matrix and the two GRanges, namely intdata(), x_intervals(), y_intervals().
>
> In case of symetric matrix, the two GRanges can be the same, so I was interested in simply creating in this case, a pointer from B to A. How can I do it in R please ??
> Second, I'm wondering if it exists other matrix-like object optimized for big matrix (5000 x 5000). I quicky saw the Matrix object from the CRAN, useful for sparse matrix.
> Any suggestion would be appreciated !

This is not a super-big object, so perhaps you're running in to problems with 
R's propensity to copy data? An easy solution might be to re-use the 
SummarizedExperiment class, which addresses this issue by placing the 'assays' 
data in a reference class.

     library(GenomicRanges)

     .HTCexp = setClass("HTCexp", contains="SummarizedExperiment",
       representation=representation(y_intervals="GenomicRanges"))

     HTCexp <-
         function(intdata = matrix(0, 0, 0), x_intervals=GRanges(),
                  y_intervals=GRanges(), ...)
     {
         .HTCexp(SummarizedExperiment(intdata, rowData=x_intervals),
                 y_intervals=y_intervals, ...)
     }

which already gives

 > HTCexp()
class: HTCexp
dim: 0 0
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames: NULL
colData names(0):
 > m <- matrix(0, 5000, 5000,
+             dimnames=list(seq_len(5000), seq_len(5000)))
 > g <- GRanges("A", IRanges(1:5000, width=0))
 > HTCexp(m, g, g)
class: HTCexp
dim: 5000 5000
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames(5000): 1 2 ... 4999 5000
colData names(0):

I think you'd need to implement "[" and a 'y_intervals' accessors


     setGeneric("y_intervals", function(x, ...) standardGeneric("y_intervals"))

     setMethod("y_intervals", "HTCexp", function(x, ...) {
         x at y_intervals
     })

     setMethod("[", "HTCexp", function(x, i, j, ..., drop=TRUE) {
         ## not sure that this is complete...
         if (missing(i) && missing(j))
             x
         else {
             se <- as(x, "SummarizedExperiment")
             if (missing(i))
                 initialize(x, se[,j], y_intervals=y_intervals(x)[j])
             else if (missing(j))
                 initialize(x, se[i,])
             else
                 initialize(x, se[i,j], y_intervals=y_intervals(x)[j])
         }
     })

Martin

>
> Thank you
> Regards
> Nicolas
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list