[R] Principal component analysis
Arne.Muller@aventis.com
Arne.Muller at aventis.com
Mon Dec 9 11:39:03 CET 2002
Dear R users,
I'm trying to cluster 30 gene chips using principal component analysis in
package mva.prcomp. Each chip is a point with 1,000 dimensions. PCA is
probably just one of several methods to cluster the 30 chips. However, I
don't know how to run prcomp, and I don't know how to interpret it's output.
If there are 30 data points in 1,000 dimensions each, do I have to provide
the data in a 1,000x30 matrix or data frame (i.e. 1000 columns)?
> data[1:5,1:5]
x.HU.04h.Ctr.118.01.4.ctrl x.HU.04h.010.118.04.4.0.1
1 21 45
2 24 35
3 109 173
4 86 99
5 130 204
x.HU.04h.050.118.05.4.0.5 x.HU.04h.100.118.06.4.1
x.HU.24h.Ctr.118.07.24.ctrl
1 24 28
22
2 25 25
20
3 107 125
95
4 72 79
61
5 126 166
128
> m <- t(data)
> m[1:5,1:5]
1 2 3 4 5
x.HU.04h.Ctr.118.01.4.ctrl 21 24 109 86 130
x.HU.04h.010.118.04.4.0.1 45 35 173 99 204
x.HU.04h.050.118.05.4.0.5 24 25 107 72 126
x.HU.04h.100.118.06.4.1 28 25 125 79 166
x.HU.24h.Ctr.118.07.24.ctrl 22 20 95 61 128
> pca <- prcomp(m, retx = TRUE)
there are 30 "PC"s displayed (I've truncated the output). Shouldn't tere be
1000 PCs, with the 1st PC beeing the most discriminativePC? In a principal
comp. Alanysis, aren't there as many PCs as dimensions? On the other hand I
thought that PCA somehow collapses dimensionality ... . What is are PCs for
my 30 data points. Afterwards I'd also like to display the results in a
diagram, e.g. in 2 or 3 dimensions, to visualise clusters. I'm not sure I'm
doing the right thing.
I'm happy for any comments and explanations,
kind regards,
Arne
> pca["x"]
$x
PC1 PC2 PC3
PC4 PC5 PC6
x.HU.04h.Ctr.118.01.4.ctrl -1272.1203 -249.465634 -2185.20558
1083.15814 421.67755 100.26612
x.HU.04h.010.118.04.4.0.1 -1493.8623 1483.260490 -1090.31102
-286.70562 1274.34804 37.88463
x.HU.04h.050.118.05.4.0.5 -2688.5157 2055.336930 -83.70279
154.24116 1202.58763 -604.08124
x.HU.04h.100.118.06.4.1 -2477.3271 2029.248507 -14.37922
-314.08755 1422.88800 -509.37791
x.HU.24h.Ctr.118.07.24.ctrl -3198.7071 -2264.516725 209.04504
763.56664 -762.61481 -542.35302
x.HU.24h.010.118.10.24.0.1 -3370.0556 -2190.205040 298.17498
702.80862 -783.48849 -509.22595
x.HU.24h.050.118.11.24.0.5 -2662.8329 -1436.400955 1478.81635
129.83910 406.10451 337.88507
x.HU.24h.100.118.12.24.1 -4193.3836 -1210.594052 1844.22923
914.84373 -11.33207 11.58916
x.HU.04h.Ctr.206.13.4.ctrl 2305.5848 -180.584730 -2017.05340
1274.07436 132.14756 930.35799
x.HU.04h.010.206.14.4.0.1 1703.4976 2032.883878 -78.67578
1697.50799 -301.93647 234.25139
x.HU.04h.025.206.15.4.0.25 1294.1932 2876.862370 534.11002
1229.73355 -68.31220 226.47566
x.HU.04h.050.206.16.4.0.5 3666.8441 3520.249397 1187.37289
-45.83772 -271.06706 145.75181
x.HU.04h.100.206.17.4.1 3657.9687 3432.347857 1318.94834
-484.73817 -405.36077 349.88323
x.HU.24h.Ctr.206.18.24.ctrl 5796.1801 -2985.085353 -1052.08033
-306.45667 265.22940 -732.59152
x.HU.24h.010.206.19.24.0.1 4429.6809 -2685.801572 -1027.66157
822.76848 171.15959 -1118.12987
x.HU.24h.025.206.20.24.0.25 5672.4279 -1559.896071 1177.74742
-734.37026 336.46183 -132.25625
x.HU.24h.050.206.21.24.0.5 4855.8534 -809.112994 1825.99459
-594.09109 190.00907 -234.33254
x.HU.24h.100.206.22.24.1 4015.2594 -166.349964 1015.96643
622.86202 -267.17075 400.45741
x.HU.04h.Ctr.821.23.4.ctrl -485.9779 91.410337 -2446.35100
-263.83351 -453.89005 491.14145
x.HU.04h.Ctr.821.24.4.ctrl 390.5580 -8.264721 -2707.56580
-1265.35762 -156.67885 555.41157
x.HU.04h.010.821.25.4.0.1 -1138.4096 1733.090222 -885.89460
-460.04065 -276.68619 -200.20132
x.HU.04h.025.821.26.4.0.25 -1622.0565 2333.333749 -297.50664
-838.12742 -783.19740 -206.76327
x.HU.04h.050.821.27.4.0.5 -1920.9992 2462.596326 -213.80507
-463.02219 -683.90138 -731.04753
x.HU.04h.100.821.28.4.1 -2288.0687 2251.971783 223.28215
-472.78173 -668.16917 -623.88411
x.HU.24h.Ctr.821.29.24.ctrl -599.7405 -2105.800732 -792.89966
-902.43731 -158.37800 314.34868
x.HU.24h.Ctr.821.30.24.ctrl -743.5533 -2154.937309 -350.37118
-744.69040 -479.01087 172.03340
x.HU.24h.010.821.31.24.0.1 -2240.3848 -1963.626249 306.05426
-178.59331 -166.16473 266.24216
x.HU.24h.025.821.32.24.0.25 -1840.1627 -1667.075636 1271.79029
-333.21614 -178.28014 477.06373
x.HU.24h.050.821.33.24.0.5 -1575.7248 -1431.615872 1059.90748
-531.84286 537.76332 502.46140
x.HU.24h.100.821.34.24.1 -1976.1656 -1233.258236 1492.02417
-175.17357 515.26288 590.73966
[...]
More information about the R-help
mailing list