[R] arrays of arrays

Claudia Beleites cbeleites at units.it
Wed Nov 10 13:57:39 CET 2010


Hi Sachin,

I guess there are several different possibilities that are more or less handy 
depending on your data:

- lists were mentioned already, and I think they are the most "natural" 
representation of ragged arrays. Also very flexible, e.g. you can introduce more 
dimensions. But they can get terribly slow and very memory consuming if you have 
many rows.

- If you have many rows and they have almost the same number of elements, you 
may be better off using a normal matrix and setting the unused elements to NA.

- There are also sparse matrices in package Matrix. I've never used them, but I 
guess they may be what you are after.

This here:
new("dgCMatrix"
     , i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L)
     , p = c(0L, 4L, 7L, 9L, 15L)
     , Dim = c(6L, 4L)
     , Dimnames = list(NULL, NULL)
     , x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6)
     , factors = list()
)

Is the transposed of your example:
6 x 4 sparse Matrix of class "dgCMatrix"

[1,] 0 1 4  7
[2,] 0 3 4 -1
[3,] 1 5 .  8
[4,] 1 . .  9
[5,] . . . 10
[6,] . . .  6

The numeric versions do not store the zeros, and will return 0 for for the 
elements marked with '.' in the print.

You won't get any benefit from this representation in terms of memory (over a 
normal matrix) unless the total number of elements is smaller than
nrow * max (elements per row) / 2 - nrow - some more overhead

The Matrix () function will give you a hint: check whether it produces a dense 
or a sparse matrix.

- if you are terribly tight with memory you'll program your own representation 
that just stores a vector of your values and start indices for each row.
You index then with rowstart [i] + j

Here's a comparison:

# list
 > l <- structure(list(V1 = c(0, 0, 1, 1), V2 = c(1, 3, 5), V3 = c(4,
4), V4 = c(7, -1, 8, 9, 10, 6)), .Names = c("V1", "V2", "V3",
"V4"))
 > str (l)
List of 4
  $ V1: num [1:4] 0 0 1 1
  $ V2: num [1:3] 1 3 5
  $ V3: num [1:2] 4 4
  $ V4: num [1:6] 7 -1 8 9 10 6

 > object.size (l)
736 bytes

# sparse matrix
 > s <- new("dgCMatrix"
     , i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L)
     , p = c(0L, 4L, 7L, 9L, 15L)
     , Dim = c(6L, 4L)
     , Dimnames = list(NULL, NULL)
     , x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6)
     , factors = list()
)
 > s
6 x 4 sparse Matrix of class "dgCMatrix"

[1,] 0 1 4  7
[2,] 0 3 4 -1
[3,] 1 5 .  8
[4,] 1 . .  9
[5,] . . . 10
[6,] . . .  6
 > object.size (s)
1640 bytes
# there's a lot of overhead for the sparse matrix

# matrix
 > m <- structure(c(0, 1, 4, 7, 0, 3, 4, -1, 1, 5, NA, 8, 1, NA, NA,
9, NA, NA, NA, 10, NA, NA, NA, 6), .Dim = c(4L, 6L))
 > m
      [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    1    1   NA   NA
[2,]    1    3    5   NA   NA   NA
[3,]    4    4   NA   NA   NA   NA
[4,]    7   -1    8    9   10    6
 > object.size (m)
392 bytes


# own representation
 > o <- structure(c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6), rowstart = 
c(0, 4, 7, 9)) # index of end of row before saves subtracting 1 all the time

 > o
  [1]  0  0  1  1  1  3  5  4  4  7 -1  8  9 10  6
attr(,"rowstart")
[1] 0 4 7 9
 > object.size (o)
352 bytes

 > o [attr (o, "rowstart") [2] + 3 ]
[1] 5

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it



More information about the R-help mailing list