[R] "Justify" PCA? -- was: Bartlett's Test of Sphericity

Sat Jun 18 19:43:32 CEST 2011

On Jun 18, 2011, at 16:26 , Bert Gunter wrote:

> Apologies for the obvious, but just to clarify: there is no reason to
> "justify" a PCA -- it's just an eigen decomposition of a matrix and is
> therefore "justified" by linear algebra.
> 
> If one wants to determine whether some subset of the eigenvectors =
> principal components suffice to "represent" the data in some sense,
> then that is where distributional considerations would come into play.
> But that is another (often unsatisfactory) story, typically irrelevant
> in the exploratory context where PCA is often used.

Yes, I was wondering about that too. PCA on independent variables just sorts them by variance. PCA on their correlation matrix is essentially a random orthogonal rotation. So PCA is nonsensical if there is no correlation, but it can be pretty useless even if there is. 

Apparently the KMO/Bartlett "justification" comes out of SPSS usage, where a subculture has emerged in which it is conventional to cite those two quantities. If you google for "KMO", you'll find oodles of papers using the statistics, but precious few pages actually discussing or even defining it. Shame; the "adequate sampling" notion underlying the KMO measure could do with a qualified discussion. 

(Within such subcultures there often arises an ideology that software is somehow flawed if it does not provide their favorite quantities, relevant or not. What it really is is classical group dynamics, as in "you can't go to the opera if you don't own a tuxedo". See also "bandwagon effect".)  

-pd

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com