[R] Covariance of data with missing values.

Thu Aug 16 01:10:36 CEST 2007

On 15-Aug-07 21:16:32, Rolf Turner wrote:
> 
> I have a data matrix X (n x k, say) each row of which constitutes
> an observation of a k-dimensional random variable which I am willing,
> if not happy, to assume to be Gaussian, with mean ``mu'' and
> covariance matrix ``Sigma''.  Distinct rows of X may be assumed to
> correspond to independent realizations of this random variable.
> 
> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
> missing values.
> [...]

One question, Rolf: How big is k (no of columns)?

If it's greater than 30, you may have problems with 'norm', since the
function prelim.norm() builds up its image of the places where there
are missing values as "packed integers" with code:

    r <- 1 * is.na(x)
    ....
    mdp <- as.integer((r %*% (2^((1:ncol(x)) - 1))) + 1)

i.e. 'x' would be nxk and have 1s where your X had missing, 0s elsewhere.
Then each row of 'x' is converted into a 32-bit integer whose "1" bits
correspond to the 1s in 'x'. You'll get "NA" warnings if k>30, and
things could go wrong!

In that case, I hope Chuck's suggestion works!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-07                                       Time: 00:10:33
------------------------------ XFMail ------------------------------