[R] how to tell if its better to standardize your data matrix first when you do principal
masterinex
xevilgang79 at hotmail.com
Mon Nov 23 03:42:46 CET 2009
this is how my data matrix looks like . This is just for the first 10
observations , but the pattern is similar for the other observations.
1 12.3 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0
27.4 17.1
2 6.1 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5
28.9 18.2
3 25.3 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8
25.2 16.6
4 10.4 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4
29.4 18.2
5 28.7 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2
27.7 17.7
6 20.9 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7
30.6 18.8
7 19.2 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9
27.8 17.7
8 12.4 176.00 72.50 37.8 99.6 88.5 97.1 60.0 39.4 23.2 30.5
29.0 18.8
9 4.1 191.00 74.00 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9
31.1 18.2
10 11.7 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6
30.0 19.2
and after standardizing it .
1 -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055
-0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945
2 -1.588060506 -0.185928394 0.75868364 0.23560461 -0.889886435
-0.931523054 -0.155497233 -0.1252522485 -0.53901713 0.295114747 -0.59529632
3 0.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906
-0.444727135 -0.077038289 0.0628841989 0.15515266 0.743320270 -1.17652277
4 -1.063161122 0.245595958 0.75868364 -0.24734870 0.133598535
-0.593746294 0.236797489 0.1674044475 -0.53901713 -0.153090775 0.05430971
5 1.170713001 0.226834030 0.37157577 -1.56449410 -0.428070046
0.757360745 0.346640011 0.8154299886 1.58687786 0.743320270 -0.01406987
6 0.218569932 1.202454304 1.72645331 0.45512884 0.470599683
0.201022552 1.272455554 1.4007433805 1.50010664 1.938534997 1.18257281
7 0.011051571 0.104881496 -0.20908604 -0.68639717 0.545488828
-0.166558039 0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925
8 -0.819021874 -0.082737788 0.85546060 -0.07172932 -0.140994994
-0.385119472 -0.406565855 0.1465003978 0.37208072 0.145712907 -0.59529632
9 -1.832199755 0.480120063 1.43612241 0.05998522 0.021264819
-0.981196107 0.032804234 0.7527178395 -0.10516101 0.593918429 1.25095239
10 -0.904470611 0.752168024 1.24256848 1.81617909 -0.140994994
-0.375184861 0.691859366 0.7945259389 1.36994980 1.490329474 1.14838302
this is the result of applying PCA to the data matrix
Standard deviations:
[1] 30.6645414 7.5513852 3.6927427 2.8703435 2.5363007 1.9136933
1.5624131 1.3689630 1.2976189
[10] 1.1633458 1.1118231 0.7847148 0.4802303
Rotation:
PC1 PC2 PC3 PC4 PC5
PC6 PC7 PC8
var1 0.18110712 -0.74864138 -0.46070566 -0.365658769 0.192810075
-0.132529979 0.023764851 0.03674873
var2 0.86458284 0.34243386 -0.05766909 -0.235504989 -0.046075934
0.001493006 -0.024535011 0.13439659
var3 0.03765598 0.20097537 -0.15709612 -0.343218776 -0.295201121
-0.073295697 -0.086930370 -0.54389141
var4 0.05965733 0.01737951 0.09854179 -0.030801791 0.125735684
0.341795876 -0.001735808 0.37152696
var5 0.23845698 -0.20616399 0.68948870 0.025904812 0.391188182
-0.428933369 -0.101780281 -0.16965893
var6 0.29928369 -0.47394636 0.24791449 0.341235161 -0.511378719
0.447071255 -0.077534385 -0.13198544
var7 0.19503685 0.01385823 -0.24126047 0.531403827 -0.127426510
-0.410568454 0.608163973 -0.01265457
var8 0.13261863 0.06839078 -0.37740589 0.535332339 0.366103479
0.032376851 -0.574484605 -0.05645694
var9 0.06246705 0.04407384 -0.09545362 0.037993146 -0.036651080
0.012347288 -0.192976142 -0.13027876
var10 0.03027791 0.05533988 -0.03749859 -0.009257423 0.011026593
-0.010770032 -0.104041067 0.12125263
var11 0.07435322 0.04334969 -0.02666944 0.032036374 0.464035624
0.454970952 0.347507539 -0.60527541
var12 0.04328710 0.04731771 0.00360668 -0.054200633 0.275901346
0.297800123 0.324323749 0.30487145
var13 0.02095652 0.02146485 0.03598618 -0.022510780 0.005192075
0.103988977 0.031541374 0.07877455
PC9 PC10 PC11 PC12 PC13
var1 -0.005328345 0.030549780 -0.049283616 -0.02211988 0.015660892
var2 0.170766596 -0.144031738 0.028862963 0.06984674 0.006293703
var3 -0.282549313 0.548650592 0.131284937 -0.14740722 -0.002384605
var4 0.024070488 0.614154008 -0.551480394 -0.03446124 -0.178123011
var5 -0.157551008 0.147685248 0.008044148 -0.04068258 0.007778992
var6 -0.058675551 0.006344813 0.130814072 -0.04088919 -0.028655330
var7 -0.099243751 0.171852216 -0.149231752 -0.06690208 -0.014693444
var8 0.006629025 0.199158097 0.187226774 -0.02511968 0.070896819
var9 -0.658214712 -0.320120384 -0.500003990 0.37630539 -0.023642902
var10 -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215
var11 0.157450716 -0.148991117 -0.153561998 -0.08742543 -0.056513679
var12 -0.560837576 0.098418477 0.542670501 0.10593629 -0.007670188
var13 -0.110526479 -0.012776152 -0.165279275 -0.32037870 0.914832392
this is the result of applying PCA to the standardized data matrix
Standard deviations:
[1] 2.9252556 1.1792994 0.8623322 0.7219158 0.6812740 0.5863879 0.4981330
0.4630637 0.4414004 0.4212403
[11] 0.2776168 0.2208503 0.1366760
Rotation:
PC1 PC2 PC3 PC4 PC5
PC6 PC7 PC8
var1 0.2214240 -0.528940022 -0.22438633 -0.0324934310 0.10237112
-0.47563754 0.33100129 -0.19102715
var2 0.3345528 0.023162612 -0.10713782 -0.0001760222 0.11352232
0.04469088 -0.10098447 0.18643834
var3 0.1517554 0.605551504 -0.38237721 0.0314469316 0.59507576
-0.18321494 0.08116801 0.08111090
var4 0.2862444 -0.018344029 0.34874004 -0.1945368511 0.29590927
0.30061030 -0.39160283 -0.20869249
var5 0.3027658 -0.244481933 0.03265146 -0.1559266926 0.12932226
0.02393963 -0.16226550 0.45698236
var5 0.3005716 -0.329554056 -0.13879142 -0.1626911071 0.11072123
-0.05063054 -0.06388229 0.08496036
var6 0.3160710 -0.061820244 -0.23144824 0.1247108501 -0.06038088
0.16065274 -0.18772748 0.07057902
var7 0.2973041 0.006421036 -0.17862551 0.3873606332 -0.28005086
0.34119818 -0.13590921 -0.16267799
var8 0.2955016 0.144234590 -0.26323414 -0.0068912717 -0.18117677
-0.01771120 0.03379585 -0.62830066
var9 0.2552571 0.326437989 -0.09749610 -0.2291093560 -0.61898234
-0.22847105 0.01411768 0.38312210
var10 0.2822210 0.016911093 0.28838652 0.4287108516 0.07554337
0.28403417 0.66673623 0.19445840
var11 0.2491444 0.135956228 0.53597029 0.3883062869 -0.01492335
-0.60228918 -0.26232244 -0.08966993
var12 0.2637809 0.185151550 0.33956904 -0.5971722620 -0.04476545
0.08083909 0.34854493 -0.20909842
PC9 PC10 PC11 PC12 PC13
var1 -0.40247469 0.05379733 0.063919267 0.26040567 0.015743241
var2 0.07150091 0.02906931 -0.009540692 0.02481489 0.899751898
var3 -0.11290113 0.06735920 0.100968481 -0.03902708 -0.182276335
var4 -0.52110479 -0.28262405 -0.150175234 0.06709027 -0.070349152
var5 0.36282385 -0.25907897 0.461043958 0.30566521 -0.256838644
var6 0.13245560 0.04742256 -0.174886071 -0.81057186 -0.147622115
var7 0.17950233 0.40472605 -0.602790052 0.38468466 -0.223865462
var8 -0.24062368 0.33426221 0.545545641 -0.12880676 -0.077404092
var9 0.37912190 -0.49731546 -0.023067506 0.04355862 -0.002718371
var10 -0.34729467 -0.21088629 -0.112243026 -0.03892369 -0.069031092
var11 -0.01252875 -0.22996539 -0.162156246 -0.04827985 -0.052013577
var12 0.14733228 0.12821614 0.009932520 -0.05164105 -0.025625894
var13 0.15194616 0.45367703 0.139390086 0.04590545 -0.004970894
In this case is it better to standardize the matrix or leave it as it is ?
Also , how do I compare which method gives the better result?
I also found that the proportion of the first principle after standardizing
it was reduce alot , would that mean that it is a bad idea to standardize
the matrix?
any suggestions are welcome.
--
View this message in context: http://old.nabble.com/how-to-tell-if-its-better-to-standardize-your-data-matrix-first-when-you-do-principal-tp26462070p26471838.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list