[R] how to tell if its better to standardize your data matrix first when you do principal

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Nov 23 10:17:45 CET 2009



masterinex wrote:
> this is how my data matrix looks like . This is just for the first 10
> observations , but the pattern is similar for the other observations.  
> 
> 
> 1    12.3 154.25  67.75 36.2  93.1  85.2  94.5  59.0 37.3  21.9   32.0   
> 27.4  17.1
> 2     6.1 173.25  72.25 38.5  93.6  83.0  98.7  58.7 37.3  23.4   30.5   
> 28.9  18.2
> 3    25.3 154.00  66.25 34.0  95.8  87.9  99.2  59.6 38.9  24.0   28.8   
> 25.2  16.6
> 4    10.4 184.75  72.25 37.4 101.8  86.4 101.2  60.1 37.3  22.8   32.4   
> 29.4  18.2
> 5    28.7 184.25  71.25 34.4  97.3 100.0 101.9  63.2 42.2  24.0   32.2   
> 27.7  17.7
> 6    20.9 210.25  74.75 39.0 104.5  94.4 107.8  66.0 42.0  25.6   35.7   
> 30.6  18.8
> 7    19.2 181.00  69.75 36.4 105.1  90.7 100.3  58.4 38.3  22.9   31.9   
> 27.8  17.7
> 8    12.4 176.00  72.50 37.8  99.6  88.5  97.1  60.0 39.4  23.2   30.5   
> 29.0  18.8
> 9     4.1 191.00  74.00 38.1 100.9  82.5  99.9  62.9 38.3  23.8   35.9   
> 31.1  18.2
> 10   11.7 198.25  73.50 42.1  99.6  88.6 104.1  63.1 41.7  25.0   35.6   
> 30.0  19.2
> 
> 
> and after standardizing it  . 
> 
> 1   -0.831228836 -0.898881671 -0.98330178 -0.77420686 -0.952294055
> -0.712961621 -0.814552365 -0.0625400993 -0.53901713 -0.825399059 -0.08244945
> 2   -1.588060506 -0.185928394  0.75868364  0.23560461 -0.889886435
> -0.931523054 -0.155497233 -0.1252522485 -0.53901713  0.295114747 -0.59529632
> 3    0.755676279 -0.908262635 -1.56396359 -1.74011349 -0.615292906
> -0.444727135 -0.077038289  0.0628841989  0.15515266  0.743320270 -1.17652277
> 4   -1.063161122  0.245595958  0.75868364 -0.24734870  0.133598535
> -0.593746294  0.236797489  0.1674044475 -0.53901713 -0.153090775  0.05430971
> 5    1.170713001  0.226834030  0.37157577 -1.56449410 -0.428070046 
> 0.757360745  0.346640011  0.8154299886  1.58687786  0.743320270 -0.01406987
> 6    0.218569932  1.202454304  1.72645331  0.45512884  0.470599683 
> 0.201022552  1.272455554  1.4007433805  1.50010664  1.938534997  1.18257281
> 7    0.011051571  0.104881496 -0.20908604 -0.68639717  0.545488828
> -0.166558039  0.095571389 -0.1879643976 -0.10516101 -0.078389855 -0.11663925
> 8   -0.819021874 -0.082737788  0.85546060 -0.07172932 -0.140994994
> -0.385119472 -0.406565855  0.1465003978  0.37208072  0.145712907 -0.59529632
> 9   -1.832199755  0.480120063  1.43612241  0.05998522  0.021264819
> -0.981196107  0.032804234  0.7527178395 -0.10516101  0.593918429  1.25095239
> 10  -0.904470611  0.752168024  1.24256848  1.81617909 -0.140994994
> -0.375184861  0.691859366  0.7945259389  1.36994980  1.490329474  1.14838302
> 
> 
> 
> this is the result of applying PCA to the data matrix
> 
> Standard deviations:
>  [1] 30.6645414  7.5513852  3.6927427  2.8703435  2.5363007  1.9136933 
> 1.5624131  1.3689630  1.2976189
> [10]  1.1633458  1.1118231  0.7847148  0.4802303
> 
> Rotation:
>             PC1         PC2         PC3          PC4          PC5         
> PC6          PC7         PC8
> var1  0.18110712 -0.74864138 -0.46070566 -0.365658769  0.192810075
> -0.132529979  0.023764851  0.03674873
> var2  0.86458284  0.34243386 -0.05766909 -0.235504989 -0.046075934 
> 0.001493006 -0.024535011  0.13439659
> var3  0.03765598  0.20097537 -0.15709612 -0.343218776 -0.295201121
> -0.073295697 -0.086930370 -0.54389141
> var4    0.05965733  0.01737951  0.09854179 -0.030801791  0.125735684 
> 0.341795876 -0.001735808  0.37152696
> var5   0.23845698 -0.20616399  0.68948870  0.025904812  0.391188182
> -0.428933369 -0.101780281 -0.16965893
> var6   0.29928369 -0.47394636  0.24791449  0.341235161 -0.511378719 
> 0.447071255 -0.077534385 -0.13198544
> var7     0.19503685  0.01385823 -0.24126047  0.531403827 -0.127426510
> -0.410568454  0.608163973 -0.01265457
> var8   0.13261863  0.06839078 -0.37740589  0.535332339  0.366103479 
> 0.032376851 -0.574484605 -0.05645694
> var9    0.06246705  0.04407384 -0.09545362  0.037993146 -0.036651080 
> 0.012347288 -0.192976142 -0.13027876
> var10   0.03027791  0.05533988 -0.03749859 -0.009257423  0.011026593
> -0.010770032 -0.104041067  0.12125263
> var11  0.07435322  0.04334969 -0.02666944  0.032036374  0.464035624 
> 0.454970952  0.347507539 -0.60527541
> var12 0.04328710  0.04731771  0.00360668 -0.054200633  0.275901346 
> 0.297800123  0.324323749  0.30487145
> var13   0.02095652  0.02146485  0.03598618 -0.022510780  0.005192075 
> 0.103988977  0.031541374  0.07877455
> 
>                PC9         PC10         PC11        PC12         PC13
> var1   -0.005328345  0.030549780 -0.049283616 -0.02211988  0.015660892
> var2   0.170766596 -0.144031738  0.028862963  0.06984674  0.006293703
> var3  -0.282549313  0.548650592  0.131284937 -0.14740722 -0.002384605
> var4     0.024070488  0.614154008 -0.551480394 -0.03446124 -0.178123011
> var5   -0.157551008  0.147685248  0.008044148 -0.04068258  0.007778992
> var6   -0.058675551  0.006344813  0.130814072 -0.04088919 -0.028655330
> var7     -0.099243751  0.171852216 -0.149231752 -0.06690208 -0.014693444
> var8    0.006629025  0.199158097  0.187226774 -0.02511968  0.070896819
> var9    -0.658214712 -0.320120384 -0.500003990  0.37630539 -0.023642902
> var10   -0.259704149 -0.273030750 -0.074006053 -0.83676032 -0.348034215
> var11   0.157450716 -0.148991117 -0.153561998 -0.08742543 -0.056513679
> var12 -0.560837576  0.098418477  0.542670501  0.10593629 -0.007670188
> var13   -0.110526479 -0.012776152 -0.165279275 -0.32037870  0.914832392
> 
> 
> 
> 
> this is the result of applying PCA to the standardized data matrix
> 
> Standard deviations:
>  [1] 2.9252556 1.1792994 0.8623322 0.7219158 0.6812740 0.5863879 0.4981330
> 0.4630637 0.4414004 0.4212403
> [11] 0.2776168 0.2208503 0.1366760
> 
> Rotation:
>             PC1          PC2         PC3           PC4         PC5        
> PC6         PC7         PC8
> var1   0.2214240 -0.528940022 -0.22438633 -0.0324934310  0.10237112
> -0.47563754  0.33100129 -0.19102715
> var2   0.3345528  0.023162612 -0.10713782 -0.0001760222  0.11352232 
> 0.04469088 -0.10098447  0.18643834
> var3   0.1517554  0.605551504 -0.38237721  0.0314469316  0.59507576
> -0.18321494  0.08116801  0.08111090
> var4   0.2862444 -0.018344029  0.34874004 -0.1945368511  0.29590927 
> 0.30061030 -0.39160283 -0.20869249
> var5   0.3027658 -0.244481933  0.03265146 -0.1559266926  0.12932226 
> 0.02393963 -0.16226550  0.45698236
> var5   0.3005716 -0.329554056 -0.13879142 -0.1626911071  0.11072123
> -0.05063054 -0.06388229  0.08496036
> var6   0.3160710 -0.061820244 -0.23144824  0.1247108501 -0.06038088 
> 0.16065274 -0.18772748  0.07057902
> var7   0.2973041  0.006421036 -0.17862551  0.3873606332 -0.28005086 
> 0.34119818 -0.13590921 -0.16267799
> var8   0.2955016  0.144234590 -0.26323414 -0.0068912717 -0.18117677
> -0.01771120  0.03379585 -0.62830066
> var9   0.2552571  0.326437989 -0.09749610 -0.2291093560 -0.61898234
> -0.22847105  0.01411768  0.38312210
> var10   0.2822210  0.016911093  0.28838652  0.4287108516  0.07554337 
> 0.28403417  0.66673623  0.19445840
> var11  0.2491444  0.135956228  0.53597029  0.3883062869 -0.01492335
> -0.60228918 -0.26232244 -0.08966993
> var12   0.2637809  0.185151550  0.33956904 -0.5971722620 -0.04476545 
> 0.08083909  0.34854493 -0.20909842
> 
> 
>              PC9        PC10         PC11        PC12         PC13
> var1 -0.40247469  0.05379733  0.063919267  0.26040567  0.015743241
> var2   0.07150091  0.02906931 -0.009540692  0.02481489  0.899751898
> var3  -0.11290113  0.06735920  0.100968481 -0.03902708 -0.182276335
> var4  -0.52110479 -0.28262405 -0.150175234  0.06709027 -0.070349152
> var5   0.36282385 -0.25907897  0.461043958  0.30566521 -0.256838644
> var6   0.13245560  0.04742256 -0.174886071 -0.81057186 -0.147622115
> var7   0.17950233  0.40472605 -0.602790052  0.38468466 -0.223865462
> var8  -0.24062368  0.33426221  0.545545641 -0.12880676 -0.077404092
> var9   0.37912190 -0.49731546 -0.023067506  0.04355862 -0.002718371
> var10  -0.34729467 -0.21088629 -0.112243026 -0.03892369 -0.069031092
> var11  -0.01252875 -0.22996539 -0.162156246 -0.04827985 -0.052013577
> var12   0.14733228  0.12821614  0.009932520 -0.05164105 -0.025625894
> var13   0.15194616  0.45367703  0.139390086  0.04590545 -0.004970894
> 
>  In this case  is it better to standardize the matrix or leave it as it is ? 
> Also , how do I compare  which method gives the better result?
> I also found that the proportion of the first principle after standardizing
> it was reduce alot , would that mean that it is a bad idea to standardize
> the matrix? 
> 
> any suggestions are welcome.


I told you that you need to do the interpretation and it depends on the 
variables (are measured in the same units?). Nobody can give you an 
answer without knowledge about the variables.

And note, as Hadley mentioned before, that there are certainly 
statistical consultants available in your area, which is unknown for us 
given you post anonymously from some hotmail.com account - which is also 
not very helpful to get further answers....

Uwe Ligges




More information about the R-help mailing list