[R] Principal component analysis
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Mon Dec 9 12:15:06 CET 2002
On Mon, 9 Dec 2002 Arne.Muller at aventis.com wrote:
> Dear R users,
>
> I'm trying to cluster 30 gene chips using principal component analysis in
> package mva.prcomp. Each chip is a point with 1,000 dimensions. PCA is
> probably just one of several methods to cluster the 30 chips. However, I
> don't know how to run prcomp, and I don't know how to interpret it's output.
>
> If there are 30 data points in 1,000 dimensions each, do I have to provide
> the data in a 1,000x30 matrix or data frame (i.e. 1000 columns)?
None of those. A 30x1000 matrix.
> > data[1:5,1:5]
> x.HU.04h.Ctr.118.01.4.ctrl x.HU.04h.010.118.04.4.0.1
> 1 21 45
> 2 24 35
> 3 109 173
> 4 86 99
> 5 130 204
> x.HU.04h.050.118.05.4.0.5 x.HU.04h.100.118.06.4.1
> x.HU.24h.Ctr.118.07.24.ctrl
> 1 24 28
> 22
> 2 25 25
> 20
> 3 107 125
> 95
> 4 72 79
> 61
> 5 126 166
> 128
>
> > m <- t(data)
> > m[1:5,1:5]
> 1 2 3 4 5
> x.HU.04h.Ctr.118.01.4.ctrl 21 24 109 86 130
> x.HU.04h.010.118.04.4.0.1 45 35 173 99 204
> x.HU.04h.050.118.05.4.0.5 24 25 107 72 126
> x.HU.04h.100.118.06.4.1 28 25 125 79 166
> x.HU.24h.Ctr.118.07.24.ctrl 22 20 95 61 128
>
> > pca <- prcomp(m, retx = TRUE)
>
> there are 30 "PC"s displayed (I've truncated the output). Shouldn't tere be
> 1000 PCs, with the 1st PC beeing the most discriminativePC? In a principal
No. 970 of them span the null space: you have massive over-fitting.
> comp. Alanysis, aren't there as many PCs as dimensions? On the other hand I
> thought that PCA somehow collapses dimensionality ... . What is are PCs for
> my 30 data points. Afterwards I'd also like to display the results in a
> diagram, e.g. in 2 or 3 dimensions, to visualise clusters. I'm not sure I'm
> doing the right thing.
Well, statistically neither am I. But mathematically at least, the PCs
for your 30 data points are the `x' component of the result, and you can
plot them via
plot(pca$x[1:2])
in two dimensions, or use scatterplot3d (a package) or (preferably as it
is dynamic) the ggobi or xgobi interfaces in 3D.
This sort of thing *is* covered in many of the texts about S (or S-PLUS or
R).
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list