[R] computing var-covar matrix with much missing data

Peter Langfelder peter.langfelder at gmail.com
Mon Jan 31 18:51:54 CET 2011


On Mon, Jan 31, 2011 at 9:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:
> Is there an R function for computing a variance-covariance matrix that
> guarantees that it will have no negative eigenvalues?  In my case, there is
> a *lot* of missing data, especially for a subset of variables.  I think my
> tactic will be to compute cor(x, use="pairwise.complete.obs") and then pre-
> and post-multiply by a diagonal matrix of standard deviations that were
> computed based on all non-missing observations.  Or maybe cov() would do
> exactly that with use="pairwise.complete.obs", but that isn't really clear
> from the docs.  Next I would test to see if what I have is positive
> definite.  If the correlation matrix is positive definite, then the
> covariance matrix will be.
>
> Maybe I'll be lucky, but I need a positive-definite matrix, and this method
> is not guaranteed to produce one.  Any ideas?

You may get lucky and your matrix (cov or cor) may be positive
definite. If not, you may want to think about imputing the missing
data, which may be better than trying to massage a covariance matrix
into being positive definite. You could also try a hybrid approach of
deleting observation with lots of missing data and imputing only the
ones that are left over.

Peter



More information about the R-help mailing list