[R] Error in principal component loadings calculation
David L Carlson
dcarlson at tamu.edu
Tue Sep 15 00:36:34 CEST 2015
The quickest way to get that is
> summary(pc)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Standard deviation 1.1984186 1.1020461 0.9522896 0.9119038 0.7815774
Proportion of Variance 0.2872414 0.2429011 0.1813711 0.1663137 0.1221726
Cumulative Proportion 0.2872414 0.5301425 0.7115137 0.8778274 1.0000000
David
-----Original Message-----
From: Marcelo Kittlein [mailto:kittlein at mdp.edu.ar]
Sent: Monday, September 14, 2015 1:28 PM
To: David L Carlson <dcarlson at tamu.edu>
Subject: Re: [R] Error in principal component loadings calculation
Thanks David
I thought that "Proportion var" was the proportion of the variance of
successive component scores. The one you get with "summary" of the
princomp object.
Proportion Var 0.2 0.2 0.2 0.2 0.2
On 14/09/15 21:07, David L Carlson wrote:
> The sum of the squared loadings will always sum to 1 because they are standardized by dividing them by the standard deviation of each component. The terminology for principal components is not as consistent as we could hope. What princomp() calls loadings is really the structure matrix (the correlation between each variable and the component). The pattern matrix (often called the loadings) are the regression coefficients for computing the principal component scores. You are probably looking for the pattern matrix which is easy to obtain by multiplying by the standard deviations:
>
>> set.seed(42)
>> data <- matrix(runif(100), 20, 5)
>> pc <- princomp(data, cor=TRUE)
>> loadings(pc)
> Loadings:
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> [1,] 0.638 0.249 -0.260 -0.679
> [2,] -0.714 0.449 0.298 -0.444
> [3,] 0.585 -0.152 0.522 -0.231 0.555
> [4,] -0.617 -0.543 -0.564
> [5,] -0.496 0.154 0.479 -0.687 -0.172
>
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> SS loadings 1.0 1.0 1.0 1.0 1.0
> Proportion Var 0.2 0.2 0.2 0.2 0.2
> Cumulative Var 0.2 0.4 0.6 0.8 1.0
>> rowSums(pc$loadings^2)
> [1] 1 1 1 1 1
>> # Notice that the column sums of the squared loadings all equal 0
>> # Now multiply each loading by its standard deviation
>> sweep(pc$loadings, 2, pc$sdev, "*")
> Loadings:
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> [1,] 0.765 0.275 -0.237 -0.531
> [2,] -0.787 0.427 0.271 -0.347
> [3,] 0.701 -0.167 0.497 -0.211 0.434
> [4,] -0.680 -0.518 -0.515
> [5,] -0.594 0.169 0.456 -0.627 -0.134
>
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> SS loadings 1.436 1.215 0.907 0.832 0.611
> Proportion Var 0.287 0.243 0.181 0.166 0.122
> Cumulative Var 0.287 0.530 0.712 0.878 1.000
>> pc$sdev^2
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> 1.4362072 1.2145055 0.9068555 0.8315685 0.6108632
>> # Now the sum of the squared loadings equals the
>> # squared standard deviation (aka the eigenvalues)
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Marcelo Kittlein
> Sent: Monday, September 14, 2015 8:46 AM
> To: r-help at r-project.org
> Subject: [R] Error in principal component loadings calculation
>
> Hi all
>
> I have been using "princomp" to obtain the principal components of some
> data and find that the loadings returned by the function appear to have
> some error.
>
> in a simple example if a calculate de pc for a random matrix I get that
> all loadings for the different components have the same proportion of
> variance
>
> data <- matrix(runif(100), 20, 5)
> pc <- princomp(data, cor=TRUE)
> loadings(pc)
>
> Loadings:
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> [1,] -0.280 0.510 0.674 -0.217 -0.400
> [2,] 0.529 -0.353 -0.694 -0.330
> [3,] -0.111 0.563 -0.713 -0.336 -0.222
> [4,] -0.530 -0.502 -0.178 0.140 -0.645
> [5,] -0.590 -0.215 -0.582 0.516
>
> Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
> SS loadings 1.0 1.0 1.0 1.0 1.0
> Proportion Var 0.2 0.2 0.2 0.2 0.2
> Cumulative Var 0.2 0.4 0.6 0.8 1.0
>
> This keep returning the same proportion of variance for each component
> regardless of the data used.
>
> my R version is
>
> > R.Version()
> $platform
> [1] "x86_64-unknown-linux-gnu"
>
> $arch
> [1] "x86_64"
>
> $os
> [1] "linux-gnu"
>
> $system
> [1] "x86_64, linux-gnu"
>
> $status
> [1] ""
>
> $major
> [1] "3"
>
> $minor
> [1] "2.1"
>
> $year
> [1] "2015"
>
> $month
> [1] "06"
>
> $day
> [1] "18"
>
> $`svn rev`
> [1] "68531"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 3.2.1 (2015-06-18)"
>
> $nickname
> [1] "World-Famous Astronaut"
>
> some hint would be much appreciated.
>
> Best regards
>
> Marcelo Kittlein
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list