[R] how to tell if its better to standardize your data matrix first when you do principal
Uwe Ligges
ligges at statistik.tu-dortmund.de
Mon Nov 23 10:17:45 CET 2009
masterinex wrote:
> this is how my data matrix looks like . This is just for the first 10
> observations , but the pattern is similar for the other observations.
>
>
> 1 12.3 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0
> 27.4 17.1
> 2 6.1 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5
> 28.9 18.2
> 3 25.3 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8
> 25.2 16.6
> 4 10.4 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4
> 29.4 18.2
> 5 28.7 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2
> 27.7 17.7
> 6 20.9 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7
> 30.6 18.8
> 7 19.2 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9
> 27.8 17.7
> 8 12.4 176.00 72.50 37.8 99.6 88.5 97.1 60.0 39.4 23.2 30.5
> 29.0 18.8
> 9 4.1 191.00 74.00 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9
> 31.1 18.2
> 10 11.7 198.25 73.50 42.1 99.6 88.6 104.1 63.1 41.7 25.0 35.6
> 30.0 19.2
>
>
> and after standardizing it .
>
> 1 -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055
> -0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945
> 2 -1.588060506 -0.185928394 0.75868364 0.23560461 -0.889886435
> -0.931523054 -0.155497233 -0.1252522485 -0.53901713 0.295114747 -0.59529632
> 3 0.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906
> -0.444727135 -0.077038289 0.0628841989 0.15515266 0.743320270 -1.17652277
> 4 -1.063161122 0.245595958 0.75868364 -0.24734870 0.133598535
> -0.593746294 0.236797489 0.1674044475 -0.53901713 -0.153090775 0.05430971
> 5 1.170713001 0.226834030 0.37157577 -1.56449410 -0.428070046
> 0.757360745 0.346640011 0.8154299886 1.58687786 0.743320270 -0.01406987
> 6 0.218569932 1.202454304 1.72645331 0.45512884 0.470599683
> 0.201022552 1.272455554 1.4007433805 1.50010664 1.938534997 1.18257281
> 7 0.011051571 0.104881496 -0.20908604 -0.68639717 0.545488828
> -0.166558039 0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925
> 8 -0.819021874 -0.082737788 0.85546060 -0.07172932 -0.140994994
> -0.385119472 -0.406565855 0.1465003978 0.37208072 0.145712907 -0.59529632
> 9 -1.832199755 0.480120063 1.43612241 0.05998522 0.021264819
> -0.981196107 0.032804234 0.7527178395 -0.10516101 0.593918429 1.25095239
> 10 -0.904470611 0.752168024 1.24256848 1.81617909 -0.140994994
> -0.375184861 0.691859366 0.7945259389 1.36994980 1.490329474 1.14838302
>
>
>
> this is the result of applying PCA to the data matrix
>
> Standard deviations:
> [1] 30.6645414 7.5513852 3.6927427 2.8703435 2.5363007 1.9136933
> 1.5624131 1.3689630 1.2976189
> [10] 1.1633458 1.1118231 0.7847148 0.4802303
>
> Rotation:
> PC1 PC2 PC3 PC4 PC5
> PC6 PC7 PC8
> var1 0.18110712 -0.74864138 -0.46070566 -0.365658769 0.192810075
> -0.132529979 0.023764851 0.03674873
> var2 0.86458284 0.34243386 -0.05766909 -0.235504989 -0.046075934
> 0.001493006 -0.024535011 0.13439659
> var3 0.03765598 0.20097537 -0.15709612 -0.343218776 -0.295201121
> -0.073295697 -0.086930370 -0.54389141
> var4 0.05965733 0.01737951 0.09854179 -0.030801791 0.125735684
> 0.341795876 -0.001735808 0.37152696
> var5 0.23845698 -0.20616399 0.68948870 0.025904812 0.391188182
> -0.428933369 -0.101780281 -0.16965893
> var6 0.29928369 -0.47394636 0.24791449 0.341235161 -0.511378719
> 0.447071255 -0.077534385 -0.13198544
> var7 0.19503685 0.01385823 -0.24126047 0.531403827 -0.127426510
> -0.410568454 0.608163973 -0.01265457
> var8 0.13261863 0.06839078 -0.37740589 0.535332339 0.366103479
> 0.032376851 -0.574484605 -0.05645694
> var9 0.06246705 0.04407384 -0.09545362 0.037993146 -0.036651080
> 0.012347288 -0.192976142 -0.13027876
> var10 0.03027791 0.05533988 -0.03749859 -0.009257423 0.011026593
> -0.010770032 -0.104041067 0.12125263
> var11 0.07435322 0.04334969 -0.02666944 0.032036374 0.464035624
> 0.454970952 0.347507539 -0.60527541
> var12 0.04328710 0.04731771 0.00360668 -0.054200633 0.275901346
> 0.297800123 0.324323749 0.30487145
> var13 0.02095652 0.02146485 0.03598618 -0.022510780 0.005192075
> 0.103988977 0.031541374 0.07877455
>
> PC9 PC10 PC11 PC12 PC13
> var1 -0.005328345 0.030549780 -0.049283616 -0.02211988 0.015660892
> var2 0.170766596 -0.144031738 0.028862963 0.06984674 0.006293703
> var3 -0.282549313 0.548650592 0.131284937 -0.14740722 -0.002384605
> var4 0.024070488 0.614154008 -0.551480394 -0.03446124 -0.178123011
> var5 -0.157551008 0.147685248 0.008044148 -0.04068258 0.007778992
> var6 -0.058675551 0.006344813 0.130814072 -0.04088919 -0.028655330
> var7 -0.099243751 0.171852216 -0.149231752 -0.06690208 -0.014693444
> var8 0.006629025 0.199158097 0.187226774 -0.02511968 0.070896819
> var9 -0.658214712 -0.320120384 -0.500003990 0.37630539 -0.023642902
> var10 -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215
> var11 0.157450716 -0.148991117 -0.153561998 -0.08742543 -0.056513679
> var12 -0.560837576 0.098418477 0.542670501 0.10593629 -0.007670188
> var13 -0.110526479 -0.012776152 -0.165279275 -0.32037870 0.914832392
>
>
>
>
> this is the result of applying PCA to the standardized data matrix
>
> Standard deviations:
> [1] 2.9252556 1.1792994 0.8623322 0.7219158 0.6812740 0.5863879 0.4981330
> 0.4630637 0.4414004 0.4212403
> [11] 0.2776168 0.2208503 0.1366760
>
> Rotation:
> PC1 PC2 PC3 PC4 PC5
> PC6 PC7 PC8
> var1 0.2214240 -0.528940022 -0.22438633 -0.0324934310 0.10237112
> -0.47563754 0.33100129 -0.19102715
> var2 0.3345528 0.023162612 -0.10713782 -0.0001760222 0.11352232
> 0.04469088 -0.10098447 0.18643834
> var3 0.1517554 0.605551504 -0.38237721 0.0314469316 0.59507576
> -0.18321494 0.08116801 0.08111090
> var4 0.2862444 -0.018344029 0.34874004 -0.1945368511 0.29590927
> 0.30061030 -0.39160283 -0.20869249
> var5 0.3027658 -0.244481933 0.03265146 -0.1559266926 0.12932226
> 0.02393963 -0.16226550 0.45698236
> var5 0.3005716 -0.329554056 -0.13879142 -0.1626911071 0.11072123
> -0.05063054 -0.06388229 0.08496036
> var6 0.3160710 -0.061820244 -0.23144824 0.1247108501 -0.06038088
> 0.16065274 -0.18772748 0.07057902
> var7 0.2973041 0.006421036 -0.17862551 0.3873606332 -0.28005086
> 0.34119818 -0.13590921 -0.16267799
> var8 0.2955016 0.144234590 -0.26323414 -0.0068912717 -0.18117677
> -0.01771120 0.03379585 -0.62830066
> var9 0.2552571 0.326437989 -0.09749610 -0.2291093560 -0.61898234
> -0.22847105 0.01411768 0.38312210
> var10 0.2822210 0.016911093 0.28838652 0.4287108516 0.07554337
> 0.28403417 0.66673623 0.19445840
> var11 0.2491444 0.135956228 0.53597029 0.3883062869 -0.01492335
> -0.60228918 -0.26232244 -0.08966993
> var12 0.2637809 0.185151550 0.33956904 -0.5971722620 -0.04476545
> 0.08083909 0.34854493 -0.20909842
>
>
> PC9 PC10 PC11 PC12 PC13
> var1 -0.40247469 0.05379733 0.063919267 0.26040567 0.015743241
> var2 0.07150091 0.02906931 -0.009540692 0.02481489 0.899751898
> var3 -0.11290113 0.06735920 0.100968481 -0.03902708 -0.182276335
> var4 -0.52110479 -0.28262405 -0.150175234 0.06709027 -0.070349152
> var5 0.36282385 -0.25907897 0.461043958 0.30566521 -0.256838644
> var6 0.13245560 0.04742256 -0.174886071 -0.81057186 -0.147622115
> var7 0.17950233 0.40472605 -0.602790052 0.38468466 -0.223865462
> var8 -0.24062368 0.33426221 0.545545641 -0.12880676 -0.077404092
> var9 0.37912190 -0.49731546 -0.023067506 0.04355862 -0.002718371
> var10 -0.34729467 -0.21088629 -0.112243026 -0.03892369 -0.069031092
> var11 -0.01252875 -0.22996539 -0.162156246 -0.04827985 -0.052013577
> var12 0.14733228 0.12821614 0.009932520 -0.05164105 -0.025625894
> var13 0.15194616 0.45367703 0.139390086 0.04590545 -0.004970894
>
> In this case is it better to standardize the matrix or leave it as it is ?
> Also , how do I compare which method gives the better result?
> I also found that the proportion of the first principle after standardizing
> it was reduce alot , would that mean that it is a bad idea to standardize
> the matrix?
>
> any suggestions are welcome.
I told you that you need to do the interpretation and it depends on the
variables (are measured in the same units?). Nobody can give you an
answer without knowledge about the variables.
And note, as Hadley mentioned before, that there are certainly
statistical consultants available in your area, which is unknown for us
given you post anonymously from some hotmail.com account - which is also
not very helpful to get further answers....
Uwe Ligges
More information about the R-help
mailing list