[R-sig-hpc] Broadcast large matrix to slaves

Nathan S. Watson-Haigh nathan.watson-haigh at csiro.au
Wed May 6 00:41:22 CEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dirk Eddelbuettel wrote:
> On 5 May 2009 at 14:00, Nathan S. Watson-Haigh wrote:
> | -----BEGIN PGP SIGNED MESSAGE-----
> | Hash: SHA1
> | 
> | I'd like to broadcast a large matrix symmetrical matrix (17190 x 17190) to all
> | slaves. However I get the following error:
> | Error: serialization is too large to store in a raw vector
> | 
> | Is there any way I could send such a large matrix to the slaves? Since it's
> | symmetrical I could send just one triangle along with the diagonal, but I'd like
> | to know why a large matrix can't be sent and if it can be circumvented!
> 
> 
> edd at ron:~> r -p -e '17190*17190'
> [1] 295496100
> edd at ron:~> r -p -e '17190*17190/1024/1024'
> [1] 281.8
> edd at ron:~>

Actually, the memory requirement for the matrix is:
> 17190^2*8/(1024^3)
[1] 2.201618

That's 2.2Gb! I had hoped I'd be able to do this:
> values <- m[upper.tri(m, diag=TRUE)]
> length(values)
[1] 147756645
> mpi.bcast.Robj2slave(values)

And then in the slave do this:
> make.symmetric <- function(values){
+ # Values specify row-wise lower triangle or column-wise upper triangle
+ # Note:
+ #   if length of values isn't triangular, specifically the nth triangular
+ #   number, it will fill completely the largest matrix it can and throw the
+ #   warning:
+ #   "number of items to replace is not a multiple of replacement length"
+ .nth <- floor((sqrt(8*length(values) + 1) - 1) / 2)
+ .matrix <- matrix(0, .nth, .nth)
+ .matrix[upper.tri(.matrix, diag=TRUE)] <- values
+ .matrix <- .matrix + t(.matrix) - diag(diag(.matrix))
+ # Could use instead of line above:
+ #   .select <- lower.tri(.matrix)
+ #   .matrix[.select] <- t(.matrix)[.select]
+ .matrix
+ }
> m <- make.symmetric(values)

This way, I'm passing just over half the amount of data to the slaves, but I
still get the error:
Error: serialization is too large to store in a raw vector

According to serialize docs:
"A raw vector is limited to 2^31 - 1 bytes, but R objects can exceed this and
their serializations will normally be larger than the objects."

The size (bytes) of my upper triangle vector is:
> (length(values)*8)
[1] 2363968800

Which is bigger than the limit suggested by serialize:
> 2^31 - 1
[1] 2147483647

Therefore I will rethink my approach, possibly loading the data from file in
each slave.

Cheers,
Nath



> 
> 280mb is indeed a lot. A general rule of thumb is to minimise communication
> and maximise computation on the nodes.  Maybe in this case you need to save
> the matrix and ship the file to the slaves, and reload it there. Would that
> work?
> 
> Dirk
> 


- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia

Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoAwJIACgkQ9gTv6QYzVL4xdQCcC355sNRHBp1Gwaz8JqEGB3Jp
xIsAoJS3cD4xgj1OHitTm/jvTd1gWw8A
=wF/Z
-----END PGP SIGNATURE-----



More information about the R-sig-hpc mailing list