[R] after PCA, the pc values are so large, wrong?

bbslover dluthm at yeah.net
Sat Nov 7 13:50:31 CET 2009


rm(list=ls())
yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y<-yx.df[,1]
x<-yx.df[,2:643]
#conver to matrix
mat<-as.matrix(x)
#get row number
rownum<-nrow(mat)
#remove the constant parameters
mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))]
dim(yx.df)
dim(mat1)
#remove columns with numbers of zero >0.95 
mat2<-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum>0.95))] 
dim(yx.df)
dim(mat2)
#remove colunms that sd<0.5
mat3<-mat2[,apply(mat2,2,function(.col)!all(sd(.col)<0.5))]
dim(yx.df)
dim(mat3)
#PCA analysis
mat3.pr<-prcomp(mat3,cor=T)
summary(mat3.pr,loading=T)
pre.cmp<-predict(mat3.pr)
cmp<-pre.cmp[,1:3]
cmp
DF<-cbind(Y,cmp) 
DF<-as.data.frame(DF)
names(DF)<-c('y','p1','p2','p3')
DF
summary(lm(y~p1+p2+p3,data=DF))
mat3.pr<-prcomp(DF,cor=T)
summary(mat3.pr)
pre<-predict(mat3.pr)
pre1<-pre[,1:3]
pre1
colnames(pre1)<-c("x1","x2","x3")
pre1
pc<-cbind(y,pre1)
pc<-as.data.frame(pc)
lm.pc<-lm(y~x1+x2+x3,data=pc)
summary(lm.pc)

above, my code about pca, but after finishing it, the first three pcs are
some large, why? and the fit value

r2 are bad.   belowe is my value on the firest 3 pcs.
> pre1
              PC1          PC2          PC3
 [1,] -15181.5190  1944.392700 -1074.326182
 [2,] -32152.4533  1007.113729  3201.361408
 [3,] -15836.5362  2117.988273  -555.799383
 [4,]  -1618.5561  1481.020337   255.530132
 [5,]  -5407.5030  1975.779398   -84.646283
 [6,]  -9662.1949  2611.220928  -417.435782
 [7,] -30488.2102   577.385588  1853.420297
 [8,]  -2135.2563 -4506.112873  1382.413284
 [9,]  -1584.2796 -4645.142062   929.146895
[10,]   -668.7664 -4876.250486   177.691446
[11,]  -2188.5914 -4495.203080  1432.428127
[12,] -19633.9581  2159.000138 -1598.710872
[13,] -26849.1088  -515.574085 -2683.552623
[14,]  -9492.9503 -4868.648205  1236.986097
[15,] -13857.6517 -4810.228193  1296.342199
[16,] -11596.5097 -8181.631403   462.913210
[17,] -25948.6564  -746.442386 -3415.426682
[18,]  15386.4477   709.974524   555.160973
[19,]  21642.7516  1163.456075  -609.437740
[20,]  22236.7094   675.562564  -136.992578
[21,]  14354.9927   611.996274    -4.867054
[22,]  12569.9493  1111.842240   585.540985
[23,]  20739.0219  3078.679745  1662.902248
[24,]   9472.0249   648.769910   381.487034
[25,]  17299.5307  1424.712428  1522.311676
[26,]  13231.2735   587.761915   170.448061
[27,]  10843.5590   705.485396   -79.931518
[28,]   9402.8803 -1978.216853 -1534.244078
[29,]  13094.9525   212.042937  -363.941664
[30,]   9337.3522   537.885230   189.558999
[31,]   7747.1347  -141.004825 -1664.082447
[32,]   4640.1161 -1489.652284 -3584.574135
[33,]  13241.5054   175.630689  -486.250927
[34,]   3867.2204   814.830143  1584.358007
[35,]   8614.5030   708.274447   814.295587
[36,] -18815.6774  -480.311541  1248.369916
[37,]  -1860.0810  1195.557861   269.322703
[38,]   7172.0057     4.216905 -1191.448702
[39,]  -7233.2271 -2361.951658  -235.293358
[40,]   1841.3548  1187.225488   632.116420
[41,]  12465.2336   367.822405   160.751014
[42,] -39021.7259  1972.333778  3167.504098
[43,]  13098.7736  -424.152058  -567.846037
[44,]   9793.7729  -559.084900  -210.696126
[45,]  13111.1861    22.772626  -318.242722
[46,]  13169.0604     7.808885  -363.995563
[47,]   3306.6293  -694.908211  -642.996604
[48,]  10779.8582  -989.175596 -1619.861931
[49,]  10872.6913  -747.979343 -1375.317959
[50,]  -3057.5633  1838.449143  1454.886518
[51,]  -6854.9316  2338.753165  1113.510561
[52,] -15077.1823  1917.776905 -1158.158633
[53,] -45862.8305  1173.157521 -1707.293955
[54,] -14294.1553  1716.708462 -1794.064434
[55,]  24645.0508  2519.904889  1424.233563
[56,]  23303.5998  2250.088386   839.587354
[57,]  18865.5231   897.566446    36.240598
[58,]    227.2659 -6582.661199  -712.892569
[59,]  15336.8371   722.953549   593.903314
[60,]  13030.8715   228.509670  -312.933654
[61,]   5826.0388   331.077814   -53.417878
[62,]  13150.4446  -437.612023  -608.342969
[63,]  11728.3897   -83.151510   569.007995
[64,]  11021.5720  -869.425283 -1216.724017
[65,]   9625.3142   137.388994   138.735249
[66,] -15905.2704  3735.547166   421.846379
[67,] -15539.7628  3331.399648   104.886572
[68,]  -2294.9924  1648.164750   822.075221
[69,] -10120.0153  1558.766306  -333.378256
[70,] -24241.4554  -533.700229  1516.603088
[71,]  -1036.6022 -4782.136067   475.195011
[72,] -24575.2244  2655.599986 -1965.946921

the fit result below:
Call:
lm(formula = y ~ x1 + x2 + x3, data = pc)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.29638 -0.47622  0.01059  0.49268  1.69335 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
x1          -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
x2          -4.095e-05  3.448e-05  -1.188    0.239    
x3          -8.106e-05  6.412e-05  -1.264    0.210    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.691 on 68 degrees of freedom
Multiple R-squared: 0.3644,     Adjusted R-squared: 0.3364 
F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07 

x2,x3 is not significance. by pricipal, after PCA, the pcs should
significance, but my data is not, why? 
-- 
View this message in context: http://old.nabble.com/after-PCA%2C-the-pc-values-are-so-large%2C-wrong--tp26240926p26240926.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list