[R] PCA and categorical data
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Mar 6 10:25:26 CET 2009
You might want to look into correspondence analysis, which has several
variants of PCA designed for categorical data.
On Fri, 6 Mar 2009, Galanidis Alexandros wrote:
> Hi all,
>
> I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote:
>
> One method to find such relationships is to select appropriate variables and
> to view the data using a method like Principle Components Analysis (PCA) [4].
> This approach gives us a clear picture of the data using KL-plot of the PCA.
> However, the method is not settled for the data including categorical data.
> [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf]
>
> but I'm still not sure if it WRONG to do so.
Since normally categorical data is taken to be binomial or Poisson
distributed, the variance varies with the mean and least-squares (the
basis of PCA) is then sub-optimal. Correspondence analysis takes that
into account (at least to some extent).
> Any opinion or reference would be very helpful
There is a basic introduction in MASS4, with references to more
comprehensive accounts.
> thanks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list