[R] Principal component analysis

Arne.Muller@aventis.com Arne.Muller at aventis.com
Mon Dec 9 11:39:03 CET 2002


Dear R users,

I'm trying to cluster 30 gene chips using principal component analysis in
package mva.prcomp. Each chip is a point with 1,000 dimensions. PCA is
probably just one of several methods to cluster the 30 chips. However, I
don't know how to run prcomp, and I don't know how to interpret it's output.

If there are 30 data points in 1,000 dimensions each, do I have to provide
the data in a 1,000x30 matrix or data frame (i.e. 1000 columns)?

> data[1:5,1:5]
  x.HU.04h.Ctr.118.01.4.ctrl x.HU.04h.010.118.04.4.0.1
1                         21                        45
2                         24                        35
3                        109                       173
4                         86                        99
5                        130                       204
  x.HU.04h.050.118.05.4.0.5 x.HU.04h.100.118.06.4.1
x.HU.24h.Ctr.118.07.24.ctrl
1                        24                      28
22
2                        25                      25
20
3                       107                     125
95
4                        72                      79
61
5                       126                     166
128

> m <- t(data)
> m[1:5,1:5]
                             1  2   3  4   5
x.HU.04h.Ctr.118.01.4.ctrl  21 24 109 86 130
x.HU.04h.010.118.04.4.0.1   45 35 173 99 204
x.HU.04h.050.118.05.4.0.5   24 25 107 72 126
x.HU.04h.100.118.06.4.1     28 25 125 79 166
x.HU.24h.Ctr.118.07.24.ctrl 22 20  95 61 128

> pca <- prcomp(m, retx = TRUE)

there are 30 "PC"s displayed (I've truncated the output). Shouldn't tere be
1000 PCs, with the 1st PC beeing the most discriminativePC? In a principal
comp. Alanysis, aren't there as many PCs as dimensions? On the other hand I
thought that PCA somehow collapses dimensionality ... . What is are PCs for
my 30 data points. Afterwards I'd also like to display the results in a
diagram, e.g. in 2 or 3 dimensions, to visualise clusters. I'm not sure I'm
doing the right thing.

	I'm happy for any comments and explanations,

	kind regards,

	Arne


> pca["x"]
$x
                             
                                     PC1          PC2         PC3
PC4        PC5         PC6
  x.HU.04h.Ctr.118.01.4.ctrl  -1272.1203  -249.465634 -2185.20558
1083.15814  421.67755   100.26612
  x.HU.04h.010.118.04.4.0.1   -1493.8623  1483.260490 -1090.31102
-286.70562 1274.34804    37.88463
  x.HU.04h.050.118.05.4.0.5   -2688.5157  2055.336930   -83.70279
154.24116 1202.58763  -604.08124
  x.HU.04h.100.118.06.4.1     -2477.3271  2029.248507   -14.37922
-314.08755 1422.88800  -509.37791
  x.HU.24h.Ctr.118.07.24.ctrl -3198.7071 -2264.516725   209.04504
763.56664 -762.61481  -542.35302
  x.HU.24h.010.118.10.24.0.1  -3370.0556 -2190.205040   298.17498
702.80862 -783.48849  -509.22595
  x.HU.24h.050.118.11.24.0.5  -2662.8329 -1436.400955  1478.81635
129.83910  406.10451   337.88507
  x.HU.24h.100.118.12.24.1    -4193.3836 -1210.594052  1844.22923
914.84373  -11.33207    11.58916
  x.HU.04h.Ctr.206.13.4.ctrl   2305.5848  -180.584730 -2017.05340
1274.07436  132.14756   930.35799
  x.HU.04h.010.206.14.4.0.1    1703.4976  2032.883878   -78.67578
1697.50799 -301.93647   234.25139
  x.HU.04h.025.206.15.4.0.25   1294.1932  2876.862370   534.11002
1229.73355  -68.31220   226.47566
  x.HU.04h.050.206.16.4.0.5    3666.8441  3520.249397  1187.37289
-45.83772 -271.06706   145.75181
  x.HU.04h.100.206.17.4.1      3657.9687  3432.347857  1318.94834
-484.73817 -405.36077   349.88323
  x.HU.24h.Ctr.206.18.24.ctrl  5796.1801 -2985.085353 -1052.08033
-306.45667  265.22940  -732.59152
  x.HU.24h.010.206.19.24.0.1   4429.6809 -2685.801572 -1027.66157
822.76848  171.15959 -1118.12987
  x.HU.24h.025.206.20.24.0.25  5672.4279 -1559.896071  1177.74742
-734.37026  336.46183  -132.25625
  x.HU.24h.050.206.21.24.0.5   4855.8534  -809.112994  1825.99459
-594.09109  190.00907  -234.33254
  x.HU.24h.100.206.22.24.1     4015.2594  -166.349964  1015.96643
622.86202 -267.17075   400.45741
  x.HU.04h.Ctr.821.23.4.ctrl   -485.9779    91.410337 -2446.35100
-263.83351 -453.89005   491.14145
  x.HU.04h.Ctr.821.24.4.ctrl    390.5580    -8.264721 -2707.56580
-1265.35762 -156.67885   555.41157
  x.HU.04h.010.821.25.4.0.1   -1138.4096  1733.090222  -885.89460
-460.04065 -276.68619  -200.20132
  x.HU.04h.025.821.26.4.0.25  -1622.0565  2333.333749  -297.50664
-838.12742 -783.19740  -206.76327
  x.HU.04h.050.821.27.4.0.5   -1920.9992  2462.596326  -213.80507
-463.02219 -683.90138  -731.04753
  x.HU.04h.100.821.28.4.1     -2288.0687  2251.971783   223.28215
-472.78173 -668.16917  -623.88411
  x.HU.24h.Ctr.821.29.24.ctrl  -599.7405 -2105.800732  -792.89966
-902.43731 -158.37800   314.34868
  x.HU.24h.Ctr.821.30.24.ctrl  -743.5533 -2154.937309  -350.37118
-744.69040 -479.01087   172.03340
  x.HU.24h.010.821.31.24.0.1  -2240.3848 -1963.626249   306.05426
-178.59331 -166.16473   266.24216
  x.HU.24h.025.821.32.24.0.25 -1840.1627 -1667.075636  1271.79029
-333.21614 -178.28014   477.06373
  x.HU.24h.050.821.33.24.0.5  -1575.7248 -1431.615872  1059.90748
-531.84286  537.76332   502.46140
  x.HU.24h.100.821.34.24.1    -1976.1656 -1233.258236  1492.02417
-175.17357  515.26288   590.73966

[...]




More information about the R-help mailing list