[R] PCA and categorical data

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 6 10:25:26 CET 2009


You might want to look into correspondence analysis, which has several 
variants of PCA designed for categorical data.

On Fri, 6 Mar 2009, Galanidis Alexandros wrote:

> Hi all,
>
> I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote:
>
> One method to find such relationships is to select appropriate variables and
> to view the data using a method like Principle Components Analysis (PCA) [4].
> This approach gives us a clear picture of the data using KL-plot of the PCA.
> However, the method is not settled for the data including categorical data.
> [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf]
>
> but I'm still not sure if it WRONG to do so.

Since normally categorical data is taken to be binomial or Poisson 
distributed, the variance varies with the mean and least-squares (the 
basis of PCA) is then sub-optimal.  Correspondence analysis takes that 
into account (at least to some extent).

> Any opinion or reference would be very helpful

There is a basic introduction in MASS4, with references to more 
comprehensive accounts.

> thanks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list