[R] problem with PCA

Denis Francisci denis.francisci at gmail.com
Sat Mar 11 10:21:24 CET 2017


Thank you David for your answer.
If I understood the relative positions of variable arrows don't reflect the
coefficient of correlation of the original variables. In fact these
positions change if I use different PC axes.
But in some manual about PCA in R I read: "Pairs of variables that form
acute angles at the origin, close to 0°, should be highly and positively
correlated; variables close to right angles tend to have low correlation;
variables at obtuse angles, close to 180°, tend to have high negative
correlation".

And If I do a fictional test, it seems true:

tb<-data.frame(
  c(1,2,3,4,5,6,7,8,9), #orig data
  c(2,4,5,8,10,12,14,16,18),#strong positive correlation
  c(25,29,52,63,110,111,148,161,300),#weakly correlation
  c(-1,-2,-3,-4,-5,-6,-7,-8,-9),#strong negative correlation
  c(3,8,4,6,1,3,2,5,7)#not correlation
)
names(tb)<-c("orig","corr+","corr+2","corr-","random")

pca<-prcomp(as.matrix(tb),scale=T)
biplot(pca,choices = c(1,2))

On the first 2 PC the positions of arrows reflect perfectly the original
correlations.

My data behaviour differently, maybe because my original variables are not
strong correlated?

2017-03-10 15:49 GMT+01:00 David L Carlson <dcarlson a tamu.edu>:

> This is more a question about principal components analysis than about R.
> You have 4 variables and they are moderately correlated with one another
> (weight and hole are only .2). When the data consist of measurements, this
> usually suggests that the overall size of the object is being partly
> measured by each variable. In your case object size is measured by the
> first principle component (PC1) with larger objects having more negative
> scores so larger objects are on the left and smaller ones are on the right
> of the biplot.
>
> The biplot can only display 2 of the 4 dimensions of your data at one
> time. In the first 2 dimensions, diam and height are close together, but in
> the 3rd dimension (PC3), they are on opposite sides of the component. If
> you plot different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see
> below), the arrows will look different because you are looking from
> different directions.
>
> > pca
> Standard deviations:
> [1] 1.5264292 0.8950379 0.7233671 0.5879295
>
> Rotation:
>               PC1         PC2         PC3        PC4
> height -0.5210224 -0.06545193  0.80018012 -0.2897646
> diam   -0.5473677  0.06309163 -0.57146893 -0.6081376
> hole   -0.4598646 -0.70952862 -0.17476677  0.5045297
> weight -0.4663141  0.69878797 -0.05090785  0.5400508
>
> > biplot(pca, choices=c(1, 3))
> > biplot(pca, choices=c(2, 3))
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces a r-project.org] On Behalf Of Denis
> Francisci
> Sent: Friday, March 10, 2017 4:45 AM
> To: R-help Mailing List <r-help a r-project.org>
> Subject: [R] problem with PCA
>
> Hi all.
> I'm newbie in PCA by I don't understand a behaviour of R.
> I have this data matrix:
>
> >mx_fus
>   height diam  hole  weight
> 1    2.3  3.5  1.1   18
> 2    2.0  3.5  0.9   17
> 3    3.8  4.3  0.7   34
> 4    2.1  3.4  0.9   15
> 5    2.3  3.8  1.0   19
> 6    2.2  3.8  1.0   19
> 7    3.2  4.4  0.9   34
> 8    3.0  4.3  1.0   30
> 9    2.8  3.9  0.9   21
> 10   3.3  4.2  1.1   33
> 11   2.3  3.9  0.9   25
> 12   2.3  3.3  0.5   17
> 13   0.9  2.4  0.4   10
> 14   1.4  2.4  0.5   10
> 15   2.2  3.6  0.7   22
> 16   2.9  3.8  0.8   30
> 17   2.9  3.5  0.6   27
> 18   2.3  3.5  0.5   24
> 19   1.8  2.3  0.5   29
> 20   1.4  2.5  0.6   34
> 21   0.8  2.3  0.6   21
> 22   1.8  2.4  0.6   23
> 23   1.5  2.2  0.6    7
> 24   0.9  1.7  0.4   14
> 25   2.1  2.2  0.5   25
> 26   1.3  2.4  0.6   33
> 27   1.3  2.7  0.4   39
> 28   0.5  2.2  0.5   13
> 29   1.4  4.2  0.8   23
> 30   1.6  2.0  0.4   30
> 31   1.4  2.2  0.6   25
> 32   1.8  2.5  0.6   28
> 33   1.4  2.6  0.6   41
> 34   1.6  2.3  0.3   32
> 35   1.6  2.5  0.5   41
> 36   2.8  2.9  0.8   47
> 37   0.6  2.5  0.8   21
> 38   1.6  2.8  0.7   13
> 39   1.7  3.3  0.8   17
> 40   1.6  3.9  1.9   20
> 41   1.4  4.7  0.9   26
> 42   1.2  4.2  0.7   21
> 43   3.5  4.2  0.9   47
> 44   2.3  3.6  0.7   24
> 45   2.3  3.4  0.4   21
> 46   1.9  2.6  0.7   14
> 47   1.9  3.0  0.7   15
> 48   2.7  3.7  0.9   26
> 49   3.0  3.8  0.7   35
> 50   1.2  2.0  0.7    5
> 51   1.6  2.5  0.5   15
> 52   1.3  2.6  0.5   16
> 53   2.5  3.9  0.9   32
> 54   0.9  3.3  0.6    9
> 55   1.8  2.4  0.5   17
> 56   2.4  3.7  1.1   30
> 57   2.1  3.5  1.1   22
> 58   2.6  3.9  1.0   38
> 59   2.6  3.6  1.0   27
> 60   2.6  4.1  1.0   34
> 61   2.9  3.6  0.8   32
> 62   2.6  3.3  0.7   22
> 63   1.8  2.5  0.7   26
> 64   3.0  2.8  1.3    2
> 65   0.5  2.2  0.4    3
> 66   1.9  3.4  0.7   14
> 67   1.4  3.8  0.9   18
> 68   2.0  4.0  1.0   30
> 69   3.1  4.0  1.3   21
> 70   2.5  4.0  0.8   19
> 71   2.5  4.5  1.0   20
> 72   1.8  3.5  1.4   18
> 73   2.1  3.5  1.4   25
> 74   1.5  2.6  0.5    9
> 75   2.8  3.2  1.2   16
> 76   1.0  5.0  0.3   32
> 77   0.3  5.8  0.5   56
> 78   0.5  1.5  0.2    1
> 79   0.7  1.4  0.2    1
> 80   0.5  1.3  0.2    1
> 81   0.7  3.3  0.4    7
> 82   1.9  4.7  1.0   24
> 83   3.1  4.2  0.9   49
> 84   2.8  3.6  0.7   28
> 85   2.7  3.2  0.7   29
> 86   3.0  4.0  0.9   36
> 87   1.7  2.7  0.7   14
> 88   1.5  2.9  0.7   18
> 89   2.9  3.5  0.7   30
> 90   3.0  3.4  0.8   30
> 91   2.0  2.8  0.5   14
> 92   2.4  3.5  0.7   24
> 93   0.8  4.1  0.6   12
> 94   1.7  2.5  0.5   23
> 95   1.4  2.4  0.8   31
> 96   1.5  2.7  0.4   20
> 97   2.6  3.7  0.6   31
> 98   2.6  3.0  0.6   18
> 99   2.5  5.0  0.7   40
> 100  2.5  3.7  0.5   30
> 101  2.4  2.9  0.7   17
> 102  2.3  3.0  0.5   15
> 103  2.2  3.3  0.6   19
> 104  1.5  2.1  0.5    5
> 105  2.0  2.2  0.5   10
> 106  2.6  3.5  0.6   26
> 107  2.3  3.0  0.6   15
> 108  2.5  4.5  0.7   40
> 109  2.1  3.1  0.5   15
> 110  1.3  2.1  0.8   14
> 111  0.8  2.5  0.2    5
> 112  0.6  3.1  0.7    8
>
> I perform a PCA in R
>
> >pca<-prcomp(mx_fus,scale=TRUE)
> >biplot(pca, choices = c(1,2), cex=0.7)
>
> The biplot put the arrows of diam and height very near on the first
> component axis.
> So I understand that these 2 variables are well represented in the PC1 and
> they are correlated each other.
> But if I test the correlation, the value o correlation coefficient is low
>
> >cor(mx_fus[,1],mx_fus[,2])
> 0.4828185
>
> Why the plot says a thing and correlation function says the opposite?
> Two near arrows don't represent a strong correlation between the 2
> variables (as I read in some manuals), but only with the component axis?
>
> Than's in advance
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help a r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list