[R] non-ideal behavior in princomp/ not a feature but a bug

Fri Sep 29 09:33:14 CEST 2000

... I checked and Brian and I are both right (see bottom for prior mail
exchange). 

Let me explain:
=============================================================
1. Indeed, in principle, princomp allows data matrices with are wider than
high.
Example:
> x1
     [,1] [,2] [,3] [,4]
[1,]    1    1    2    2
[2,]    1    1    2    2
> princomp(x1)
Call:
princomp(x = x1)

Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 
     0      0      0      0 

4  variables and  2 observations.

So far so good
===========================================================

2. However, not all matrices work, such as the first one I tried yesterday.
Here is
an example of a matrix which does not work.
> x
     [,1] [,2] [,3] [,4]
[1,] -0.4 -1.6 -1.0 -1.8
[2,] -2.2 -0.3  1.4  0.2
> princomp(x)
Error in princomp(x) : covariance matrix is not non-negative definite
> 

Seeing this behavior on the matrix I tried, I hastily concluded that
princomp was not
suitable for this type (as similar functions are in many other software
packages which
are written for psychologists but not for chemometricians).

=======================================================
However, the problem is not with the design of princomp in general, but with
a little
check it performs inside, which can have problems with roundoff:

    if (any(edc$values < 0)) 
        stop("covariance matrix is not non-negative definite")

As it happens, the covariance of x1 has eigenvalues of:
> eigen(cov.wt(x1)$cov)$values
[1] 0 0 0 0
and x has eigenvalues
> eigen(cov.wt(x)$cov)$values
[1]  7.345000e+00  1.110223e-16  0.000000e+00 -1.110223e-16
Therefore, the observed behavior is logical (but wrong).

Oh, by the way:
         _         
platform Windows   
arch     x86       
os       Win32     
system   x86, Win32
status             
major    1         
minor    1.0       
year     2000      
month    June      
day      15        
language R         
> 
-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: Thursday, September 28, 2000 2:16 PM
To: r-help at stat.math.ethz.ch; Ritter, Christian C SRTCL-CTGAS
Subject: Re: [R] non-ideal behavior in princomp

> From: "Ritter, Christian C SRTCL-CTGAS" <Christian.C.Ritter at OPC.shell.com>
> Date: Thu, 28 Sep 2000 12:59:25 +0200
> 
> This problem is not limited to R, but R is one of the packages in which it
> arises.

I'm sorry, but I must be missing something here. I think R (and S)
do already do what you want here.

> princomp is a nice function which creates an object for which inspection
> methods have been written.

But there is also prcomp.  (What are `inspection functions'?  Do you
mean methods for what Bill Venables sometimes calls accessor functions?)
princomp is written for more flexibility, for example to use robust
methods of extracting PCs.

> Unfortunately, princomp does not admit cases in which the x matrix is
wider
> than high (i. e. more variables than observations). Such cases are typical

Really?

> in spectroscopy and related disciplines. It would be nice if the following
> two features were added to princomp:
> 
> 1. as a default behavior, princomp should digest wider than high matrices
> (it should just compute the nontrivial principal compontents). 

prcomp does exactly that, although there are problems for which the
trivial PCs are the interesting ones!

What problems are you finding with princomp?  If I give it a n x p matrix
with n < p, I get (correctly) p PCs, with (effectively) 0 standard
deviations for the last n-p.  It's possible that this does not always
work for numerical reasons, but can you describe the `problem' you
feel it has?  (Perhaps the criterion for non-negative-definiteness is
too stringent?)

> 2. as an optional behavior, princomp should only calculate (and return)
the
> first A principal components. This is useful, if x is very wide and very
> short. 

Well, only min(n, p) PC's are defined, so I presume you want A < min(n, p)?
Then you just extract the first A PCs.  If A << min(n, p) there are faster
algorithms, but they are not currently implemented in R, and I can
think of very few problems where the speed difference would be noticeable.

> Some might remark that all this can be done easily by svd or by
programming
> the NIPALS algorithm, but I would prefer to see it in the common version. 

It is, in prcomp which uses the svd.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._