[R-sig-hpc] Broadcast large matrix to slaves
Nathan S. Watson-Haigh
nathan.watson-haigh at csiro.au
Wed May 6 00:41:22 CEST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dirk Eddelbuettel wrote:
> On 5 May 2009 at 14:00, Nathan S. Watson-Haigh wrote:
> | -----BEGIN PGP SIGNED MESSAGE-----
> | Hash: SHA1
> |
> | I'd like to broadcast a large matrix symmetrical matrix (17190 x 17190) to all
> | slaves. However I get the following error:
> | Error: serialization is too large to store in a raw vector
> |
> | Is there any way I could send such a large matrix to the slaves? Since it's
> | symmetrical I could send just one triangle along with the diagonal, but I'd like
> | to know why a large matrix can't be sent and if it can be circumvented!
>
>
> edd at ron:~> r -p -e '17190*17190'
> [1] 295496100
> edd at ron:~> r -p -e '17190*17190/1024/1024'
> [1] 281.8
> edd at ron:~>
Actually, the memory requirement for the matrix is:
> 17190^2*8/(1024^3)
[1] 2.201618
That's 2.2Gb! I had hoped I'd be able to do this:
> values <- m[upper.tri(m, diag=TRUE)]
> length(values)
[1] 147756645
> mpi.bcast.Robj2slave(values)
And then in the slave do this:
> make.symmetric <- function(values){
+ # Values specify row-wise lower triangle or column-wise upper triangle
+ # Note:
+ # if length of values isn't triangular, specifically the nth triangular
+ # number, it will fill completely the largest matrix it can and throw the
+ # warning:
+ # "number of items to replace is not a multiple of replacement length"
+ .nth <- floor((sqrt(8*length(values) + 1) - 1) / 2)
+ .matrix <- matrix(0, .nth, .nth)
+ .matrix[upper.tri(.matrix, diag=TRUE)] <- values
+ .matrix <- .matrix + t(.matrix) - diag(diag(.matrix))
+ # Could use instead of line above:
+ # .select <- lower.tri(.matrix)
+ # .matrix[.select] <- t(.matrix)[.select]
+ .matrix
+ }
> m <- make.symmetric(values)
This way, I'm passing just over half the amount of data to the slaves, but I
still get the error:
Error: serialization is too large to store in a raw vector
According to serialize docs:
"A raw vector is limited to 2^31 - 1 bytes, but R objects can exceed this and
their serializations will normally be larger than the objects."
The size (bytes) of my upper triangle vector is:
> (length(values)*8)
[1] 2363968800
Which is bigger than the limit suggested by serialize:
> 2^31 - 1
[1] 2147483647
Therefore I will rethink my approach, possibly loading the data from file in
each slave.
Cheers,
Nath
>
> 280mb is indeed a lot. A general rule of thumb is to minimise communication
> and maximise computation on the nodes. Maybe in this case you need to save
> the matrix and ship the file to the slaves, and reload it there. Would that
> work?
>
> Dirk
>
- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia
Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkoAwJIACgkQ9gTv6QYzVL4xdQCcC355sNRHBp1Gwaz8JqEGB3Jp
xIsAoJS3cD4xgj1OHitTm/jvTd1gWw8A
=wF/Z
-----END PGP SIGNATURE-----
More information about the R-sig-hpc
mailing list