Grouped Data

N. Frerebeau

2025-01-14

## Install extra packages (if needed)
# install.packages("folio")

library(nexus)
#> Le chargement a nécessité le package : dimensio

1. Reference Groups

Provenance studies typically rely on two approaches, which can be used together:

When coercing a data.frame to a CompositionMatrix object, nexus allows to specify whether an observation belongs to a specific group (or not):

## Data from Wood and Liu 2023
data("bronze", package = "folio")

## Use the third column (dynasties) for grouping
coda <- as_composition(bronze, parts = 4:11, groups = 3)

group() allow to set groups of an existing CompositionMatrix. Missing values (NA) can be used to specify that a sample does not belong to any group.

2. Repeated Measurements

If your data contain several observations for the same sample (e.g. repeated measurements), you can use one or more categorical variable to split the data into subsets and compute the compositional mean for each:

## Compositional mean by artefact
coda <- condense(coda, by = list(bronze$dynasty, bronze$reference))

Once groups have been defined, they can be used by further methods (e.g. plotting). Note that for better readability, you can select only some of the parts (e.g. major elements):

## Select major elements
major <- coda[, is_element_major(coda)]

## Compositional bar plot
barplot(major, order_rows = "Cu", space = 0)
plot of chunk barplot

plot of chunk barplot

3. Log-Ratio Analysis

## CLR
clr <- transform_clr(coda, weights = TRUE)

## PCA
lra <- pca(clr)

## Visualize results
viz_individuals(
  x = lra, 
  extra_quali = group_names(clr),
  color = c("#004488", "#DDAA33", "#BB5566"),
  hull = TRUE
)

viz_variables(lra)
plot of chunk pcaplot of chunk pca

plot of chunk pca

4. Discriminant Analysis

The log-transformed data can be assigned to a new column, allowing us to keep working with the data in the context of the original data.frame:

## ILR
ilr <- transform_ilr(coda)

## MANOVA
fit <- manova(ilr ~ group_names(ilr))
summary(fit)
#>                   Df  Pillai approx F num Df den Df    Pr(>F)    
#> group_names(ilr)   2 0.50288   14.012     14    584 < 2.2e-16 ***
#> Residuals        297                                             
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The MANOVA results suggest that there are statistically significant differences between groups.

## LDA
discr <- MASS::lda(ilr, grouping = group_names(ilr))
plot(discr)
plot of chunk lda

plot of chunk lda


## Back transform results
transform_inverse(discr$means, origin = ilr)
#>                     Cu        Sn         Pb           Zn           Au
#> Eastern Zhou 0.7554175 0.1092147 0.12938075 5.175915e-05 2.937158e-05
#> Shang        0.8349794 0.1098670 0.05282949 7.381956e-05 1.093791e-05
#> Western Zhou 0.8614687 0.1099904 0.02498574 8.707804e-05 2.597934e-05
#>                        Ag          As           Sb
#> Eastern Zhou 0.0012883476 0.003574385 0.0010432265
#> Shang        0.0006390329 0.001391972 0.0002083711
#> Western Zhou 0.0007567810 0.002221231 0.0004640198

5. References

Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Londres, UK ; New York, USA: Chapman and Hall.

Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barceló-Vidal, C. (2003). Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology, 35(3): 279-300. DOI: 10.1023/A:1023818214614.

Greenacre, M. (2021). Compositional Data Analysis. Annual Review of Statistics and Its Application, 8(1): 271-299. DOI: 10.1146/annurev-statistics-042720-124436.

Hron, K., Filzmoser, P., de Caritat, P., Fišerová, E. and Gardlo, A. (2017). Weighted Pivot Coordinates for Compositional Data and Their Application to Geochemical Mapping. Mathematical Geosciences, 49(6): 797-814. DOI : 10.1007/s11004-017-9684-z.

Weigand, P. C., Harbottle, G. and Sayre, E. (1977). Turquoise Sources and Source Analysisis: Mesoamerica and the Southwestern U.S.A. In J. Ericson & T. K. Earle (Eds.), Exchange Systems in Prehistory, 15-34. New York, NY: Academic Press.