[R] problem with PCA
Denis Francisci
denis.francisci at gmail.com
Sat Mar 11 10:21:24 CET 2017
Thank you David for your answer.
If I understood the relative positions of variable arrows don't reflect the
coefficient of correlation of the original variables. In fact these
positions change if I use different PC axes.
But in some manual about PCA in R I read: "Pairs of variables that form
acute angles at the origin, close to 0°, should be highly and positively
correlated; variables close to right angles tend to have low correlation;
variables at obtuse angles, close to 180°, tend to have high negative
correlation".
And If I do a fictional test, it seems true:
tb<-data.frame(
c(1,2,3,4,5,6,7,8,9), #orig data
c(2,4,5,8,10,12,14,16,18),#strong positive correlation
c(25,29,52,63,110,111,148,161,300),#weakly correlation
c(-1,-2,-3,-4,-5,-6,-7,-8,-9),#strong negative correlation
c(3,8,4,6,1,3,2,5,7)#not correlation
)
names(tb)<-c("orig","corr+","corr+2","corr-","random")
pca<-prcomp(as.matrix(tb),scale=T)
biplot(pca,choices = c(1,2))
On the first 2 PC the positions of arrows reflect perfectly the original
correlations.
My data behaviour differently, maybe because my original variables are not
strong correlated?
2017-03-10 15:49 GMT+01:00 David L Carlson <dcarlson a tamu.edu>:
> This is more a question about principal components analysis than about R.
> You have 4 variables and they are moderately correlated with one another
> (weight and hole are only .2). When the data consist of measurements, this
> usually suggests that the overall size of the object is being partly
> measured by each variable. In your case object size is measured by the
> first principle component (PC1) with larger objects having more negative
> scores so larger objects are on the left and smaller ones are on the right
> of the biplot.
>
> The biplot can only display 2 of the 4 dimensions of your data at one
> time. In the first 2 dimensions, diam and height are close together, but in
> the 3rd dimension (PC3), they are on opposite sides of the component. If
> you plot different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see
> below), the arrows will look different because you are looking from
> different directions.
>
> > pca
> Standard deviations:
> [1] 1.5264292 0.8950379 0.7233671 0.5879295
>
> Rotation:
> PC1 PC2 PC3 PC4
> height -0.5210224 -0.06545193 0.80018012 -0.2897646
> diam -0.5473677 0.06309163 -0.57146893 -0.6081376
> hole -0.4598646 -0.70952862 -0.17476677 0.5045297
> weight -0.4663141 0.69878797 -0.05090785 0.5400508
>
> > biplot(pca, choices=c(1, 3))
> > biplot(pca, choices=c(2, 3))
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces a r-project.org] On Behalf Of Denis
> Francisci
> Sent: Friday, March 10, 2017 4:45 AM
> To: R-help Mailing List <r-help a r-project.org>
> Subject: [R] problem with PCA
>
> Hi all.
> I'm newbie in PCA by I don't understand a behaviour of R.
> I have this data matrix:
>
> >mx_fus
> height diam hole weight
> 1 2.3 3.5 1.1 18
> 2 2.0 3.5 0.9 17
> 3 3.8 4.3 0.7 34
> 4 2.1 3.4 0.9 15
> 5 2.3 3.8 1.0 19
> 6 2.2 3.8 1.0 19
> 7 3.2 4.4 0.9 34
> 8 3.0 4.3 1.0 30
> 9 2.8 3.9 0.9 21
> 10 3.3 4.2 1.1 33
> 11 2.3 3.9 0.9 25
> 12 2.3 3.3 0.5 17
> 13 0.9 2.4 0.4 10
> 14 1.4 2.4 0.5 10
> 15 2.2 3.6 0.7 22
> 16 2.9 3.8 0.8 30
> 17 2.9 3.5 0.6 27
> 18 2.3 3.5 0.5 24
> 19 1.8 2.3 0.5 29
> 20 1.4 2.5 0.6 34
> 21 0.8 2.3 0.6 21
> 22 1.8 2.4 0.6 23
> 23 1.5 2.2 0.6 7
> 24 0.9 1.7 0.4 14
> 25 2.1 2.2 0.5 25
> 26 1.3 2.4 0.6 33
> 27 1.3 2.7 0.4 39
> 28 0.5 2.2 0.5 13
> 29 1.4 4.2 0.8 23
> 30 1.6 2.0 0.4 30
> 31 1.4 2.2 0.6 25
> 32 1.8 2.5 0.6 28
> 33 1.4 2.6 0.6 41
> 34 1.6 2.3 0.3 32
> 35 1.6 2.5 0.5 41
> 36 2.8 2.9 0.8 47
> 37 0.6 2.5 0.8 21
> 38 1.6 2.8 0.7 13
> 39 1.7 3.3 0.8 17
> 40 1.6 3.9 1.9 20
> 41 1.4 4.7 0.9 26
> 42 1.2 4.2 0.7 21
> 43 3.5 4.2 0.9 47
> 44 2.3 3.6 0.7 24
> 45 2.3 3.4 0.4 21
> 46 1.9 2.6 0.7 14
> 47 1.9 3.0 0.7 15
> 48 2.7 3.7 0.9 26
> 49 3.0 3.8 0.7 35
> 50 1.2 2.0 0.7 5
> 51 1.6 2.5 0.5 15
> 52 1.3 2.6 0.5 16
> 53 2.5 3.9 0.9 32
> 54 0.9 3.3 0.6 9
> 55 1.8 2.4 0.5 17
> 56 2.4 3.7 1.1 30
> 57 2.1 3.5 1.1 22
> 58 2.6 3.9 1.0 38
> 59 2.6 3.6 1.0 27
> 60 2.6 4.1 1.0 34
> 61 2.9 3.6 0.8 32
> 62 2.6 3.3 0.7 22
> 63 1.8 2.5 0.7 26
> 64 3.0 2.8 1.3 2
> 65 0.5 2.2 0.4 3
> 66 1.9 3.4 0.7 14
> 67 1.4 3.8 0.9 18
> 68 2.0 4.0 1.0 30
> 69 3.1 4.0 1.3 21
> 70 2.5 4.0 0.8 19
> 71 2.5 4.5 1.0 20
> 72 1.8 3.5 1.4 18
> 73 2.1 3.5 1.4 25
> 74 1.5 2.6 0.5 9
> 75 2.8 3.2 1.2 16
> 76 1.0 5.0 0.3 32
> 77 0.3 5.8 0.5 56
> 78 0.5 1.5 0.2 1
> 79 0.7 1.4 0.2 1
> 80 0.5 1.3 0.2 1
> 81 0.7 3.3 0.4 7
> 82 1.9 4.7 1.0 24
> 83 3.1 4.2 0.9 49
> 84 2.8 3.6 0.7 28
> 85 2.7 3.2 0.7 29
> 86 3.0 4.0 0.9 36
> 87 1.7 2.7 0.7 14
> 88 1.5 2.9 0.7 18
> 89 2.9 3.5 0.7 30
> 90 3.0 3.4 0.8 30
> 91 2.0 2.8 0.5 14
> 92 2.4 3.5 0.7 24
> 93 0.8 4.1 0.6 12
> 94 1.7 2.5 0.5 23
> 95 1.4 2.4 0.8 31
> 96 1.5 2.7 0.4 20
> 97 2.6 3.7 0.6 31
> 98 2.6 3.0 0.6 18
> 99 2.5 5.0 0.7 40
> 100 2.5 3.7 0.5 30
> 101 2.4 2.9 0.7 17
> 102 2.3 3.0 0.5 15
> 103 2.2 3.3 0.6 19
> 104 1.5 2.1 0.5 5
> 105 2.0 2.2 0.5 10
> 106 2.6 3.5 0.6 26
> 107 2.3 3.0 0.6 15
> 108 2.5 4.5 0.7 40
> 109 2.1 3.1 0.5 15
> 110 1.3 2.1 0.8 14
> 111 0.8 2.5 0.2 5
> 112 0.6 3.1 0.7 8
>
> I perform a PCA in R
>
> >pca<-prcomp(mx_fus,scale=TRUE)
> >biplot(pca, choices = c(1,2), cex=0.7)
>
> The biplot put the arrows of diam and height very near on the first
> component axis.
> So I understand that these 2 variables are well represented in the PC1 and
> they are correlated each other.
> But if I test the correlation, the value o correlation coefficient is low
>
> >cor(mx_fus[,1],mx_fus[,2])
> 0.4828185
>
> Why the plot says a thing and correlation function says the opposite?
> Two near arrows don't represent a strong correlation between the 2
> variables (as I read in some manuals), but only with the component axis?
>
> Than's in advance
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help a r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list