[R] R help

Sun Feb 19 20:35:32 CET 2012

On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
> Dear all,
>   I need to generate numbers from multivariate normal with large dimensions
> (5,000,000).

Hi.

I am replying to your first email, since the other did not arrive
to my folder, possibly filtered out by a spam filter. I see them
at the web interface.

1. Error: cannot allocate vector of size 381.5 Mb

The error message makes sense. The matrix requires m*n*8/2^20 MB,
which is in your case

  m <- 100000
  n <- 500
  m*n*8/2^20

  [1] 381.4697

May be, you already have other large objects in the memory. Try to
minimize the number and size of objects, which you need simultaneously
in an R session.

2. Generating a multivariate normal distribution.

As Peter Dalgaard pointed out, a speed up is possible only
for special types of the covariance matrix Sigma. A general
way is to find a matrix A such that A A^t = Sigma. Then,
the vector A X, where X is from N(0,I) and I is an identity
matrix of an appropriate dimension, has covariance Sigma.
This is also the way, how mvtnorm package works.

A speed up is possible, if computing the product A X does not
require to have matrix A explicitly represented in memory.

The matrix A need not be a square matrix. In particular, the
previous case may be understood as using the matrix A, which
for a small m is as follows.

  m <- 5
  rho <- 0.5
  A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m))
  A

            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
  [1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000
  [2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000
  [3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000
  [4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000
  [5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068

  A %*% t(A)

       [,1] [,2] [,3] [,4] [,5]
  [1,]  1.0  0.5  0.5  0.5  0.5
  [2,]  0.5  1.0  0.5  0.5  0.5
  [3,]  0.5  0.5  1.0  0.5  0.5
  [4,]  0.5  0.5  0.5  1.0  0.5
  [5,]  0.5  0.5  0.5  0.5  1.0

This construction is conceptually possible also for a large m because
the structure of A allows to compute A X by simpler operations with
the vector X than an explicit matrix product. Namely, the expression

  rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))

or, more clearly,

  sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m)

is equivalent to the required A X, where X consists of rnorm(1)
and rnorm(m) together.

If you have a specific Sigma, describe it and we can discuss,
whether an appropriate A can be found.

Hope this helps.

Petr Savicky.