[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics
Sidoti, Salvatore A.
sidoti.23 at buckeyemail.osu.edu
Sun Nov 13 06:46:13 CET 2016
Let's say I perform 4 measurements on an animal: three are linear measurements in millimeters and the fourth is its weight in milligrams. So, we have a data set with mixed units.
Based on these four correlated measurements, I would like to obtain one "score" or value that describes an individual animal's size. I considered simply taking the geometric mean of these 4 measurements, and that would give me a "score" - larger values would be for larger animals, etc.
However, this assumes that all 4 of these measurements contribute equally to an animal's size. Of course, more than likely this is not the case. I then performed a PCA to discover how much influence each variable had on the overall data set. I was hoping to use this analysis to refine my original approach.
I honestly do not know how to apply the information from the PCA to this particular problem...
I do know, however, that principle components 1 and 2 capture enough of the variation to reduce the number of dimensions down to 2 (see analysis below with the original data set).
Note: animal weights were ln() transformed to increase correlation with the 3 other variables.
df <- data.frame(
weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 0.0540, 0.0629, 0.0930,
0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 0.0547, 0.0844,
0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 0.0645, 0.0529,
0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 0.0649, 0.0465,
0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 0.0466)),
interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 0.799,
0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 0.746,
0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 0.737,
0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 0.870,
0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 3.186,
2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 3.291,
2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 2.935,
2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 2.909,
3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 3.794,
3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 3.967,
3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 3.589,
3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 3.688,
3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))
pca_morpho <- princomp(df, cor = TRUE)
summary(pca_morpho)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.604107 0.8827323 0.7061206 0.3860275
Proportion of Variance 0.643290 0.1948041 0.1246516 0.0372543
Cumulative Proportion 0.643290 0.8380941 0.9627457 1.0000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
weight -0.371 0.907 -0.201
interoc -0.486 -0.227 -0.840
cwidth -0.537 -0.349 0.466 -0.611
clength -0.582 0.278 0.761
Comp.1 Comp.2 Comp.3 Comp.4
SS loadings 1.00 1.00 1.00 1.00
Proportion Var 0.25 0.25 0.25 0.25
Cumulative Var 0.25 0.50 0.75 1.00
Any guidance will be greatly appreciated!
Salvatore A. Sidoti
PhD Student
The Ohio State University
Behavioral Ecology
More information about the R-help
mailing list