[R] Axis scaling for PCA biplot

Christian Hennig chr|@t|@n@henn|g @end|ng |rom un|bo@|t
Tue Nov 15 12:53:30 CET 2022


Hi there,

I'm puzzled about the axis scaling in the PCA biplot.

Here's an example.

library(pdfCluster) # package cepp seems to have the same data set.

data(oliveoil) # 572 observations 10 variables

olive <- oliveoil[,3:10] # numerical variables

prolive <- princomp(olive)
summary(prolive)
# Importance of components:
#                             Comp.1       Comp.2 Comp.3       Comp.4
# Standard deviation     479.7299024 150.82827868 45.394449751 27.522646558
# Proportion of Variance   0.8970072   0.08866821  0.008031707 0.002952451
# Cumulative Proportion    0.8970072   0.98567544  0.993707152 0.996659603
#                             Comp.5       Comp.6 Comp.7       Comp.8
# Standard deviation     24.78169442 1.196956e+01 7.1390744088 6.9756965249
# Proportion of Variance  0.00239367 5.584168e-04 0.0001986489 0.0001896608
# Cumulative Proportion   0.99905327 9.996117e-01 0.9998103392 1.0000000000

plot(prolive$scores)

# Scaling of this plot reproduces the variances of the components given 
in the summary,

# as does cov(prolive$scores). This seems all fine, however...

biplot(prolive)

I have no idea what the numbers on the axes of the biplot are, at least 
not the larger ones. Chances are the smaller ones indicate the loadings. 
The larger ones are neither the same as in the first plot, nor are they 
standardised to one, but they seem to be standardised somehow, as the 
range on x- and y-axis looks the same, which it shouldn't be if 
variances represented the PCA eigenvalues.

Can anyone explain this to me?

Actually the help page of biplot.princomp says something on this, but I 
don't get my head around it:

"scale

The variables are scaled by |lambda ^ scale| and the observations are 
scaled by |lambda ^ (1-scale)| where |lambda| are the singular values as 
computed by |princomp 
<https://www.rdocumentation.org/link/princomp?package=stats&version=3.6.2>|. 
Normally |0 <= scale <= 1|, and a warning will be issued if the 
specified |scale| is outside this range."

The default value of scale seems to be 1, but then (1-scale) is zero so 
I'd assume data to be unscaled, but that should have reproduced the 
"plot" scale, shouldn't it?

Thanks,

Christian

-- 
Christian Hennig
Dipartimento di Scienze Statistiche "Paolo Fortunati",
Universita di Bologna, phone +39 05120 98163
christian.hennig using unibo.it

	[[alternative HTML version deleted]]



More information about the R-help mailing list