[R] sparse vectors

Martin Morgan mtmorgan at fhcrc.org
Tue Sep 8 18:31:46 CEST 2009


Hi Robin --

Robin Hankin wrote:
> Hi
> 
> I deal with long vectors almost all of whose elements are zero.
> Typically, the length will be ~5e7 with ~100 nonzero elements.
> 
> I want to deal with these objects using a sort of sparse
> vector.
> 
> The problem is that I want to be able to 'add' two such
> vectors.
> Toy problem follows.  Suppose I have two such objects, 'a' and 'b':

The Bioconductor package IRanges has an Rle (run length encoding) class
with math. operations defined on it.

## once only, to install IRanges
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")

## load library
library(IRanges)

It represents runs encoded by their length, rather than by their ends, so

ree2Rle <- function(ends, values)
{
    ## untested
    idx <- diff(c(0, ends)) - 1L
    len <- integer(2*length(idx))
    len[c(TRUE, FALSE)] <- idx
    len[c(FALSE, TRUE)] <- 1L

    val <- vector(typeof(values), 2*length(idx))
    val[c(FALSE, TRUE)] <- values
    Rle(lengths=len, values=val)
}

Since we're adding vectors, and R has recycling rules, we create Rle's
of the same length (by adding a '0' at the last position of b)

a <- ree2Rle(c(20,30, 10000000), c(2.2,3.3,4.4))
b <- ree2Rle(c(3, 30, length(a)), c(.1, .1, 0))

and then do the math

> system.time(abPlus <- a + b)
   user  system elapsed
  0.000   0.000   0.001
> abPlus
  'numeric' Rle instance of length 10000000 with 8 runs
  Lengths:  2 1 16 1 9 1 9999969 1
  Values :  0 0.1 0 2.2 0 3.4 0 4.4

the ends are

> cumsum(runLength(abPlus))[runValue(abPlus) != 0]
[1]        3       20       30 10000000

and values runValue(abPlus)[runValue(abPlus) != 0]

Martin


> 
> 
> 
>> a
> $index
> [1]    20   30 100000000
> 
> $val
> [1] 2.2 3.3 4.4
> 
> 
> 
>> b
> $index
> [1]   3  30
> 
> $val
> [1] 0.1 0.1
> 
>>
> 
> 
> What I want is the "sum" of these:
> 
>> AplusB
> $index
> [1]    3   20   30 100000000
> 
> $val
> [1]  0.1 2.2 3.4 4.4
> 
>>
> 
> 
> See how the value for index=30 (being common to both) is 3.4
> (=3.3+0.1).   What's the best R idiom to achieve this?
> 
> 
>




More information about the R-help mailing list