[R] Axis scaling for PCA biplot
Christian Hennig
chr|@t|@n@henn|g @end|ng |rom un|bo@|t
Tue Nov 15 12:53:30 CET 2022
Hi there,
I'm puzzled about the axis scaling in the PCA biplot.
Here's an example.
library(pdfCluster) # package cepp seems to have the same data set.
data(oliveoil) # 572 observations 10 variables
olive <- oliveoil[,3:10] # numerical variables
prolive <- princomp(olive)
summary(prolive)
# Importance of components:
# Comp.1 Comp.2 Comp.3 Comp.4
# Standard deviation 479.7299024 150.82827868 45.394449751 27.522646558
# Proportion of Variance 0.8970072 0.08866821 0.008031707 0.002952451
# Cumulative Proportion 0.8970072 0.98567544 0.993707152 0.996659603
# Comp.5 Comp.6 Comp.7 Comp.8
# Standard deviation 24.78169442 1.196956e+01 7.1390744088 6.9756965249
# Proportion of Variance 0.00239367 5.584168e-04 0.0001986489 0.0001896608
# Cumulative Proportion 0.99905327 9.996117e-01 0.9998103392 1.0000000000
plot(prolive$scores)
# Scaling of this plot reproduces the variances of the components given
in the summary,
# as does cov(prolive$scores). This seems all fine, however...
biplot(prolive)
I have no idea what the numbers on the axes of the biplot are, at least
not the larger ones. Chances are the smaller ones indicate the loadings.
The larger ones are neither the same as in the first plot, nor are they
standardised to one, but they seem to be standardised somehow, as the
range on x- and y-axis looks the same, which it shouldn't be if
variances represented the PCA eigenvalues.
Can anyone explain this to me?
Actually the help page of biplot.princomp says something on this, but I
don't get my head around it:
"scale
The variables are scaled by |lambda ^ scale| and the observations are
scaled by |lambda ^ (1-scale)| where |lambda| are the singular values as
computed by |princomp
<https://www.rdocumentation.org/link/princomp?package=stats&version=3.6.2>|.
Normally |0 <= scale <= 1|, and a warning will be issued if the
specified |scale| is outside this range."
The default value of scale seems to be 1, but then (1-scale) is zero so
I'd assume data to be unscaled, but that should have reproduced the
"plot" scale, shouldn't it?
Thanks,
Christian
--
Christian Hennig
Dipartimento di Scienze Statistiche "Paolo Fortunati",
Universita di Bologna, phone +39 05120 98163
christian.hennig using unibo.it
[[alternative HTML version deleted]]
More information about the R-help
mailing list