[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics

Jim Lemon drjimlemon at gmail.com
Sun Nov 13 21:53:03 CET 2016


Hi Salvatore,
If by "size" you mean volume, why not directly measure the volume of
your animals? They appear to be fairly small. Sometimes working out
what the critical value actually means can inform the way to measure
it.

Jim


On Sun, Nov 13, 2016 at 4:46 PM, Sidoti, Salvatore A.
<sidoti.23 at buckeyemail.osu.edu> wrote:
> Let's say I perform 4 measurements on an animal: three are linear measurements in millimeters and the fourth is its weight in milligrams. So, we have a data set with mixed units.
>
> Based on these four correlated measurements, I would like to obtain one "score" or value that describes an individual animal's size. I considered simply taking the geometric mean of these 4 measurements, and that would give me a "score" - larger values would be for larger animals, etc.
>
> However, this assumes that all 4 of these measurements contribute equally to an animal's size. Of course, more than likely this is not the case. I then performed a PCA to discover how much influence each variable had on the overall data set. I was hoping to use this analysis to refine my original approach.
>
> I honestly do not know how to apply the information from the PCA to this particular problem...
>
> I do know, however, that principle components 1 and 2 capture enough of the variation to reduce the number of dimensions down to 2 (see analysis below with the original data set).
>
> Note: animal weights were ln() transformed to increase correlation with the 3 other variables.
>
> df <- data.frame(
>   weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 0.0540, 0.0629, 0.0930,
>              0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 0.0547, 0.0844,
>              0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 0.0645, 0.0529,
>              0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 0.0649, 0.0465,
>              0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 0.0466)),
>   interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 0.799,
>               0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 0.746,
>               0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 0.737,
>               0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 0.870,
>               0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
>   cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 3.186,
>              2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 3.291,
>              2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 2.935,
>              2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 2.909,
>              3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
>   clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 3.794,
>               3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 3.967,
>               3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 3.589,
>               3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 3.688,
>               3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))
>
> pca_morpho <- princomp(df, cor = TRUE)
>
> summary(pca_morpho)
>
> Importance of components:
>                                         Comp.1          Comp.2          Comp.3          Comp.4
> Standard deviation      1.604107        0.8827323       0.7061206       0.3860275
> Proportion of Variance  0.643290        0.1948041       0.1246516       0.0372543
> Cumulative Proportion   0.643290        0.8380941       0.9627457       1.0000000
>
> Loadings:
>                         Comp.1  Comp.2  Comp.3  Comp.4
> weight          -0.371          0.907                           -0.201
> interoc         -0.486  -0.227  -0.840
> cwidth          -0.537  -0.349          0.466           -0.611
> clength         -0.582                          0.278   0.761
>
>                         Comp.1  Comp.2  Comp.3  Comp.4
> SS loadings             1.00            1.00            1.00            1.00
> Proportion Var          0.25            0.25            0.25            0.25
> Cumulative Var          0.25            0.50            0.75            1.00
>
> Any guidance will be greatly appreciated!
>
> Salvatore A. Sidoti
> PhD Student
> The Ohio State University
> Behavioral Ecology
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list