[R] how to tell if its better to standardize your data matrix first when you do principal

masterinex xevilgang79 at hotmail.com
Mon Nov 23 03:42:46 CET 2009


this is how my data matrix looks like . This is just for the first 10
observations , but the pattern is similar for the other observations.  


1    12.3 154.25  67.75 36.2  93.1  85.2  94.5  59.0 37.3  21.9   32.0   
27.4  17.1
2     6.1 173.25  72.25 38.5  93.6  83.0  98.7  58.7 37.3  23.4   30.5   
28.9  18.2
3    25.3 154.00  66.25 34.0  95.8  87.9  99.2  59.6 38.9  24.0   28.8   
25.2  16.6
4    10.4 184.75  72.25 37.4 101.8  86.4 101.2  60.1 37.3  22.8   32.4   
29.4  18.2
5    28.7 184.25  71.25 34.4  97.3 100.0 101.9  63.2 42.2  24.0   32.2   
27.7  17.7
6    20.9 210.25  74.75 39.0 104.5  94.4 107.8  66.0 42.0  25.6   35.7   
30.6  18.8
7    19.2 181.00  69.75 36.4 105.1  90.7 100.3  58.4 38.3  22.9   31.9   
27.8  17.7
8    12.4 176.00  72.50 37.8  99.6  88.5  97.1  60.0 39.4  23.2   30.5   
29.0  18.8
9     4.1 191.00  74.00 38.1 100.9  82.5  99.9  62.9 38.3  23.8   35.9   
31.1  18.2
10   11.7 198.25  73.50 42.1  99.6  88.6 104.1  63.1 41.7  25.0   35.6   
30.0  19.2


and after standardizing it  . 

1   -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055
-0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945
2   -1.588060506 -0.185928394  0.75868364  0.23560461 -0.889886435
-0.931523054 -0.155497233 -0.1252522485 -0.53901713  0.295114747 -0.59529632
3    0.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906
-0.444727135 -0.077038289  0.0628841989  0.15515266  0.743320270 -1.17652277
4   -1.063161122  0.245595958  0.75868364 -0.24734870  0.133598535
-0.593746294  0.236797489  0.1674044475 -0.53901713 -0.153090775  0.05430971
5    1.170713001  0.226834030  0.37157577 -1.56449410 -0.428070046 
0.757360745  0.346640011  0.8154299886  1.58687786  0.743320270 -0.01406987
6    0.218569932  1.202454304  1.72645331  0.45512884  0.470599683 
0.201022552  1.272455554  1.4007433805  1.50010664  1.938534997  1.18257281
7    0.011051571  0.104881496 -0.20908604 -0.68639717  0.545488828
-0.166558039  0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925
8   -0.819021874 -0.082737788  0.85546060 -0.07172932 -0.140994994
-0.385119472 -0.406565855  0.1465003978  0.37208072  0.145712907 -0.59529632
9   -1.832199755  0.480120063  1.43612241  0.05998522  0.021264819
-0.981196107  0.032804234  0.7527178395 -0.10516101  0.593918429  1.25095239
10  -0.904470611  0.752168024  1.24256848  1.81617909 -0.140994994
-0.375184861  0.691859366  0.7945259389  1.36994980  1.490329474  1.14838302



this is the result of applying PCA to the data matrix

Standard deviations:
 [1] 30.6645414  7.5513852  3.6927427  2.8703435  2.5363007  1.9136933 
1.5624131  1.3689630  1.2976189
[10]  1.1633458  1.1118231  0.7847148  0.4802303

Rotation:
            PC1         PC2         PC3          PC4          PC5         
PC6          PC7         PC8
var1  0.18110712 -0.74864138 -0.46070566 -0.365658769  0.192810075
-0.132529979  0.023764851  0.03674873
var2  0.86458284  0.34243386 -0.05766909 -0.235504989 -0.046075934 
0.001493006 -0.024535011  0.13439659
var3  0.03765598  0.20097537 -0.15709612 -0.343218776 -0.295201121
-0.073295697 -0.086930370 -0.54389141
var4    0.05965733  0.01737951  0.09854179 -0.030801791  0.125735684 
0.341795876 -0.001735808  0.37152696
var5   0.23845698 -0.20616399  0.68948870  0.025904812  0.391188182
-0.428933369 -0.101780281 -0.16965893
var6   0.29928369 -0.47394636  0.24791449  0.341235161 -0.511378719 
0.447071255 -0.077534385 -0.13198544
var7     0.19503685  0.01385823 -0.24126047  0.531403827 -0.127426510
-0.410568454  0.608163973 -0.01265457
var8   0.13261863  0.06839078 -0.37740589  0.535332339  0.366103479 
0.032376851 -0.574484605 -0.05645694
var9    0.06246705  0.04407384 -0.09545362  0.037993146 -0.036651080 
0.012347288 -0.192976142 -0.13027876
var10   0.03027791  0.05533988 -0.03749859 -0.009257423  0.011026593
-0.010770032 -0.104041067  0.12125263
var11  0.07435322  0.04334969 -0.02666944  0.032036374  0.464035624 
0.454970952  0.347507539 -0.60527541
var12 0.04328710  0.04731771  0.00360668 -0.054200633  0.275901346 
0.297800123  0.324323749  0.30487145
var13   0.02095652  0.02146485  0.03598618 -0.022510780  0.005192075 
0.103988977  0.031541374  0.07877455

               PC9         PC10         PC11        PC12         PC13
var1   -0.005328345  0.030549780 -0.049283616 -0.02211988  0.015660892
var2   0.170766596 -0.144031738  0.028862963  0.06984674  0.006293703
var3  -0.282549313  0.548650592  0.131284937 -0.14740722 -0.002384605
var4     0.024070488  0.614154008 -0.551480394 -0.03446124 -0.178123011
var5   -0.157551008  0.147685248  0.008044148 -0.04068258  0.007778992
var6   -0.058675551  0.006344813  0.130814072 -0.04088919 -0.028655330
var7     -0.099243751  0.171852216 -0.149231752 -0.06690208 -0.014693444
var8    0.006629025  0.199158097  0.187226774 -0.02511968  0.070896819
var9    -0.658214712 -0.320120384 -0.500003990  0.37630539 -0.023642902
var10   -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215
var11   0.157450716 -0.148991117 -0.153561998 -0.08742543 -0.056513679
var12 -0.560837576  0.098418477  0.542670501  0.10593629 -0.007670188
var13   -0.110526479 -0.012776152 -0.165279275 -0.32037870  0.914832392




this is the result of applying PCA to the standardized data matrix

Standard deviations:
 [1] 2.9252556 1.1792994 0.8623322 0.7219158 0.6812740 0.5863879 0.4981330
0.4630637 0.4414004 0.4212403
[11] 0.2776168 0.2208503 0.1366760

Rotation:
            PC1          PC2         PC3           PC4         PC5        
PC6         PC7         PC8
var1   0.2214240 -0.528940022 -0.22438633 -0.0324934310  0.10237112
-0.47563754  0.33100129 -0.19102715
var2   0.3345528  0.023162612 -0.10713782 -0.0001760222  0.11352232 
0.04469088 -0.10098447  0.18643834
var3   0.1517554  0.605551504 -0.38237721  0.0314469316  0.59507576
-0.18321494  0.08116801  0.08111090
var4   0.2862444 -0.018344029  0.34874004 -0.1945368511  0.29590927 
0.30061030 -0.39160283 -0.20869249
var5   0.3027658 -0.244481933  0.03265146 -0.1559266926  0.12932226 
0.02393963 -0.16226550  0.45698236
var5   0.3005716 -0.329554056 -0.13879142 -0.1626911071  0.11072123
-0.05063054 -0.06388229  0.08496036
var6   0.3160710 -0.061820244 -0.23144824  0.1247108501 -0.06038088 
0.16065274 -0.18772748  0.07057902
var7   0.2973041  0.006421036 -0.17862551  0.3873606332 -0.28005086 
0.34119818 -0.13590921 -0.16267799
var8   0.2955016  0.144234590 -0.26323414 -0.0068912717 -0.18117677
-0.01771120  0.03379585 -0.62830066
var9   0.2552571  0.326437989 -0.09749610 -0.2291093560 -0.61898234
-0.22847105  0.01411768  0.38312210
var10   0.2822210  0.016911093  0.28838652  0.4287108516  0.07554337 
0.28403417  0.66673623  0.19445840
var11  0.2491444  0.135956228  0.53597029  0.3883062869 -0.01492335
-0.60228918 -0.26232244 -0.08966993
var12   0.2637809  0.185151550  0.33956904 -0.5971722620 -0.04476545 
0.08083909  0.34854493 -0.20909842


             PC9        PC10         PC11        PC12         PC13
var1 -0.40247469  0.05379733  0.063919267  0.26040567  0.015743241
var2   0.07150091  0.02906931 -0.009540692  0.02481489  0.899751898
var3  -0.11290113  0.06735920  0.100968481 -0.03902708 -0.182276335
var4  -0.52110479 -0.28262405 -0.150175234  0.06709027 -0.070349152
var5   0.36282385 -0.25907897  0.461043958  0.30566521 -0.256838644
var6   0.13245560  0.04742256 -0.174886071 -0.81057186 -0.147622115
var7   0.17950233  0.40472605 -0.602790052  0.38468466 -0.223865462
var8  -0.24062368  0.33426221  0.545545641 -0.12880676 -0.077404092
var9   0.37912190 -0.49731546 -0.023067506  0.04355862 -0.002718371
var10  -0.34729467 -0.21088629 -0.112243026 -0.03892369 -0.069031092
var11  -0.01252875 -0.22996539 -0.162156246 -0.04827985 -0.052013577
var12   0.14733228  0.12821614  0.009932520 -0.05164105 -0.025625894
var13   0.15194616  0.45367703  0.139390086  0.04590545 -0.004970894

 In this case  is it better to standardize the matrix or leave it as it is ? 
Also , how do I compare  which method gives the better result?
I also found that the proportion of the first principle after standardizing
it was reduce alot , would that mean that it is a bad idea to standardize
the matrix? 


any suggestions are welcome.






-- 
View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26471838.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list