Version: | 0.6-1 |
VersionNote: | Released 0.5-0 on 2024-02-06 on CRAN |
Title: | Robust Methods for Multiway Data Analysis, Applicable also for Compositional Data |
Description: | Provides methods for multiway data analysis by means of Parafac and Tucker 3 models. Robust versions (Engelen and Hubert (2011) <doi:10.1016/j.aca.2011.04.043>) and versions for compositional data are also provided (Gallo (2015) <doi:10.1080/03610926.2013.798664>, Di Palma et al. (2018) <doi:10.1080/02664763.2017.1381669>). Several optimization methods alternative to ALS are available (Simonacci and Gallo (2019) <doi:10.1016/j.chemolab.2019.103822>, Simonacci and Gallo (2020) <doi:10.1007/s00500-019-04320-9>). |
Maintainer: | Valentin Todorov <valentin@todorov.at> |
Depends: | R (≥ 2.10) |
Imports: | rrcov, robustbase, ThreeWay, nnls, pracma |
Suggests: | ggplot2, reshape2, xtable, covr |
Encoding: | UTF-8 |
LazyLoad: | no |
LazyData: | no |
License: | GPL (≥ 3) |
URL: | https://github.com/valentint/rrcov3way |
BugReports: | https://github.com/valentint/rrcov3way/issues |
Repository: | CRAN |
Packaged: | 2025-07-05 17:00:27 UTC; valen |
NeedsCompilation: | no |
RoxygenNote: | 7.3.2 |
Author: | Valentin Todorov |
Date/Publication: | 2025-07-05 23:00:02 UTC |
Chemical composition of water in the main stream of Arno river
Description
Chemical composition of water in the main stream of Arno river.
Usage
data("Arno")
Format
A three-way array with dimension 23x11x4. The first dimension refers to 23 distances from the spring. The second dimension refers to the 11 chemical compositions. The third dimension refers to the time of collection - four occasions.
Details
The Arno data example was used in Gallo and Buccinati (2013) to illustrate a particular version of the Tucker model, known as the weighted principal component analysis. The Tucker3 results are usually given in the form of tables or plots and in this work for the representation of the Tucker3 results of logratio data, is proposed to use one-mode plots, clr-joint biplots (Gallo, 2015), and trajectory plots.
Source
Nisi B., Vaselli O., Buccianti A., Minissale A., Delgado-Huertas A., Tassi F., Montegrossi G. (2008). Geochemical and isotopic investigation of the dissolved load in the running waters from the Arno valley: evaluation of the natural and anthropogenic input. In Memorie Descrittive della Carta Geologica d'Italia, Nisi (eds.), 79: 1-91.
Nisi B., Buccianti A., Vaselli O., Perini G., Tassi F., Minissale A., Montegrossi G. (2008) Hydrogeochemistry and strontium isotopes in the Arno river basin (Tuscany, Italy): Constraints on natural controls by statistical modeling. Journal of Hydrology 360: 166-183.
References
Gallo M. and Buccianti A. (2013). Weighted principal component analysis for compositional data: application example for the water chemistry of the Arno river (Tuscany, central Italy), Environmetrics, 24(4):269-277.
Gallo M. (2015). Tucker3 model for compositional data. Communications in Statistics-Theory and Methods, 44(21):4441-4453.
Examples
data(Arno)
dim(Arno) # [1] 23 11 4
dim(Arno[,,1]) # [1] 23 11
rownames(Arno[,,1]) # the 23 distances from the spring
colnames(Arno[,,1]) # the 11 chemical compositions
dim(Arno[,1,]) # [1] 23 4
colnames(Arno[,1,]) # the four occasions
res <- Tucker3(Arno, robust=FALSE, coda.transform="ilr")
res
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Joint biplot
plot(res, which="jbplot", main="Joint biplot")
## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")
Fictious data set of Kiers and Mechelen (2001).
Description
A fictitious data presented by Kiers and Mechelen (2001) to illustrate Tucker3
model: A set of six persons with scores on five response variables for four
different situations. The response variables indicate to what extent each
individual displays an emotional, sensitive, caring, thorough, or accurate
behavior. The data set is represented as a 6 x 5 x 4 array. The data are
chosen such that they correspond to a Tucker3 model with P=2, Q=2, R=2
.
Usage
data(Kiers2001)
Format
A three-way array with dimension 6 x 5 x 4
.
The first dimension refers to the 6 persons. The second dimension
refers to the five responce variables.
The third dimension refers to four different situations.
Source
Kiers HAL, Mechelen IV (2001) Three-way component analysis: Principles and illustrative application. Psychological Methods 6(1):84–110
References
Todorov, V., Simonacci, V., Gallo, M., and Di Palma, M. (2025). Robust tools for three-way component analysis of compositional data: The R package rrcov3way. Behaviormetrika. In press.
Examples
data(Kiers2001)
t3 <- Tucker3(Kiers2001, P=2, Q=2, R=2)
t3
t3$A
t3$B
t3$C
t3$G
plot(t3)
plot(t3, which="jbplot", xlim=c(0, 2))
Parental behaviour in Japan
Description
The data are drawn from a study (Kojima, 1975) of the perception of parental behaviour by parents and their children. Two data sets, boys and girls are available as Kojima.boys and Kojima.girls.
Boys data were analysed in Kroonenberg (2008)
Girls data were analysed in Kroonenberg, Harshman, & Murakami (2009).
Usage
data(Kojima)
Format
Both data sets are three dimensional arrays:
boys: 150 x 18 x 4
girls: 153 x 18 x 4
The rows (1st mode) are 150 Japanese sons/153 Japanese daughters. The columns (2nd mode) are 18 scales (Acceptance, Child centerness, Possesiveness, etc.). The slices (3rd mode) are the 4 judgements (See Details for explanation).
Details
The boys
data are ratings expressing the judgments of parents with respect to their
own behaviour toward their sons, and the judgments of their sons with respect to their
parents. Thus, there are four conditions:
Father-Own behaviour (F-F),
Mother-Own behaviour (M-M),
Son-Father (B-F),
Son-Mother (B-M).
The judgments involved 150 middle-class Japanese eighth-grade boys on the 18 subscales of the inventory. Thus, the data set consists of a 150 (Sons) x 18 (scales) x 4 (judgment combinations) data array.
Similarly, the girls data are ratings expressing the judgments of parents with respect to their own behaviour toward their daughters, and the judgments of their daughters with respect to their parents. Thus, there are four conditions:
Father-Own behaviour (F-F),
Mother-Own behaviour (M-M),
Daughter-Father (G-F),
Daughter-Mother (G-M).
The judgments involved 153 middle-class Japanese eighth-grade girls on the 18 subscales of the inventory. Thus, the data set consists of a 153 (Daughters) x 18 (scales) x 4 (judgment combinations) data array.
Preprocessing Given that the data are three-way profile data they are treated in the standard manner: centering per occasion-variable combination and by normalising the data after centring per lateral slice i.e. per scale over all sons/daughters x judges combinations. For details see Kroonenberg (2008), Chapter 13.
Source
The data sets are available from the Pieter Kroonenberg's web site at https://three-mode.leidenuniv.nl/.
References
Kojima, H. (1975). Inter-battery factor analysis of parents' and children's reports of parental behavior. Japanese Psychological Bulletin, 17, 33-48 (in Japanese).
Kroonenberg, P. M. (2008). Applied multiway data analysis. Wiley series in probability and statistics. Wiley, Hoboken NJ.
Kroonenberg, P. M., Harshman, R. A, & Murakami, T. (2009). Analysing three-way profile data using the Parafac and Tucker3 models illustrated with views on parenting. Applied Multivariate Research, 13:5-41. PDF available at: http://www.phaenex.uwindsor.ca/ojs/leddy/index.php/AMR/article/viewFile/2833/2271
Examples
data(Kojima)
dim(Kojima.boys)
dim(Kojima.girls)
Robust Parafac estimator for compositional data
Description
Compute a robust Parafac model for compositional data
Usage
Parafac(X, ncomp = 2, center = FALSE,
center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"),
scale=FALSE, scale.mode=c("B", "A", "C"),
const="none", conv = 1e-06, start="svd", maxit=10000,
optim=c("als", "atld", "int2"),
robust = FALSE, coda.transform=c("none", "ilr", "clr"),
ncomp.rpca = 0, alpha = 0.75, robiter = 100, crit=0.975, trace = FALSE)
Arguments
X |
3-way array of data |
ncomp |
Number of components |
center |
Whether to center the data |
center.mode |
If centering the data, on which mode to do this |
scale |
Whether to scale the data |
scale.mode |
If scaling the data, on which mode to do this |
const |
Optional constraints for each mode. Can be a three element character
vector or a single character, one of |
conv |
Convergence criterion, defaults to |
start |
Initial values for the A, B and C components. Can be |
maxit |
Maximum number of iterations, default is |
optim |
How to optimize the CP loss function, default is to use ALS, i.e.
|
robust |
Whether to apply a robust estimation |
coda.transform |
If the data are a composition, use an ilr or clr transformation.
Default is non-compositional data, i.e. |
ncomp.rpca |
Number of components for robust PCA |
alpha |
Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75 |
robiter |
Maximal number of iterations for robust estimation |
crit |
Cut-off for identifying outliers, default |
trace |
Logical, provide trace output |
Details
The function can compute four versions of the Parafac model:
Classical Parafac,
Parafac for compositional data,
Robust Parafac and
Robust Parafac for compositional data.
This is controlled though the paramters robust=TRUE
and coda.transform=c("none", "ilr").
Value
An object of class "parafac" which is basically a list with components:
fit |
Fit value |
fp |
Fit percentage |
ss |
Sum of squares |
A |
Orthogonal loading matrix for the A-mode |
B |
Orthogonal loading matrix for the A-mode |
Bclr |
Orthogonal loading matrix for the B-mode, clr transformed. Available only if coda.transform="ilr", otherwise NULL |
C |
Orthogonal loading matrix for the C-mode |
Xhat |
(Robustly) reconstructed array |
const |
Optional constraints (same as the input parameter) |
iter |
Number of iterations |
rd |
Residual distances |
sd |
Score distances |
flag |
The observations whose residual distance |
robust |
The paramater |
coda.transform |
Which coda transformation is used, can be |
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.
Engelen, S., Frosch, S. and Jorgensen, B.M. (2009). A fully robust PARAFAC method analyzing fluorescence data. Journal of Chemometrics, 23(3): 124–131.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
Rousseeuw, P.J. and Driessen, K.V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3): 212–223.
Egozcue J.J., Pawlowsky-Glahn V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279-300
Examples
#############
##
## Example with the UNIDO Manufacturing value added data
data(va3way)
dim(va3way)
## Treat quickly and dirty the zeros in the data set (if any)
va3way[va3way==0] <- 0.001
##
res <- Parafac(va3way)
res
print(res$fit)
print(res$A)
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
data(ulabor)
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
res
## Plot Orthonormalized A-mode component plot
plot(res, which="comp", mode="A", main="Component plot, A-mode")
## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="B", main="Component plot, B-mode")
## Plot Orthonormalized C-mode component plot
plot(res, which="comp", mode="C", main="Component plot, C-mode")
Robust Tucker3 estimator for compositional data
Description
Compute a robust Tucker3 model for compositional data
Usage
Tucker3(X, P = 2, Q = 2, R = 2,
center = FALSE, center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"),
scale = FALSE, scale.mode = c("B", "A", "C"),
conv = 1e-06, start="svd",
robust = FALSE, coda.transform=c("none", "ilr", "clr"),
ncomp.rpca = 0, alpha = 0.75, robiter=100, crit=0.975, trace = FALSE)
Arguments
X |
3-way array of data |
P |
Number of A-mode components |
Q |
Number of B-mode components |
R |
Number of C-mode components |
center |
Whether to center the data |
center.mode |
If scaling the data, on which mode to do this |
scale |
Whether to scale the data |
scale.mode |
If centering the data, on which mode to do this |
conv |
Convergence criterion, defaults to |
start |
Initial values for the A, B and C components. Can be |
robust |
Whether to apply a robust estimation |
coda.transform |
If the data are a composition, use an ilr or clr transformation.
Default is non-compositional data, i.e. |
ncomp.rpca |
Number of components for robust PCA |
alpha |
Measures the fraction of outliers the algorithm should resist. Allowed values are between 0.5 and 1 and the default is 0.75 |
robiter |
Maximal number of iterations for robust estimation |
crit |
Cut-off for identifying outliers, default |
trace |
Logical, provide trace output |
Details
The function can compute four versions of the Tucker3 model:
Classical Tucker3,
Tucker3 for compositional data,
Robust Tucker3 and
Robust Tucker3 for compositional data.
This is controlled through the parameters robust=TRUE
and coda.transform="ilr"
.
Value
An object of class "tucker3" which is basically a list with components:
fit |
Fit value |
fp |
Fit percentage |
A |
Orthogonal loading matrix for the A-mode |
B |
Orthogonal loading matrix for the B-mode |
Bclr |
Orthogonal loading matrix for the B-mode, clr transformed.
Available only if |
C |
Orthogonal loading matrix for the C-mode |
GA |
Core matrix, which describes the relation between |
iter |
Number of iterations |
rd |
Residual distances |
sd |
Score distances |
flag |
The observations whose residual distance |
robust |
The paramater |
coda.transform |
The input paramater |
La |
Diagonal matrix containing the intrinsic eigenvalues for A-mode |
Lb |
Diagonal matrix containing the intrinsic eigenvalues for B-mode |
Lc |
Diagonal matrix containing the intrinsic eigenvalues for C-mode |
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
Tucker, L.R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31: 279–311.
Egozcue J.J., Pawlowsky-Glahn, V., Mateu-Figueras G. and Barcel'o-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279–300.
Examples
#############
##
## Example with the UNIDO Manufacturing value added data
data(va3way)
dim(va3way)
## Treat quickly and dirty the zeros in the data set (if any)
va3way[va3way==0] <- 0.001
##
res <- Tucker3(va3way)
res
print(res$fit)
print(res$A)
## Print the core matrix
print(res$GA)
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Joint biplot
plot(res, which="jbplot", main="Joint biplot")
## Trajectory
plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
Amino acids fluorescence data.
Description
A data set containing five simple laboratory-made samples where each sample contains different amounts of tyrosine, tryptophan and phenylalanine dissolved in phosphate buffered water. The samples were measured by fluorescence (excitation 240-300 nm, emission 250-450 nm, 1 nm intervals) on a PE LS50B spectrofluorometer.
Usage
data(amino)
Format
A three-way array with dimension 5x201x61
.
The first dimension refers to the 5 samples. The second dimension
refers to the emission measurements (250-450nm, 1nm intervals).
The third dimension refers to the excitation (240-300 nm, 1nm intervals).
Source
https://ucphchemometrics.com/datasets/.
References
Bro, R, PARAFAC: Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 1997, 38, 149-171 Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. Ph.D. Thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK). Kiers, H.A.L. (1998) A three-step algorithm for Candecomp/Parafac analysis of large data sets with multicollinearity, Journal of Chemometrics, 12, 155-171.
Examples
## Not run:
data(amino)
## Plotting Emission spectra
oldpar <- par(mfrow=c(2,1))
matplot(t(amino[,,1]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence emission spectra")
matplot(t(amino[,,5]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence emission spectra")
par(oldpar)
## Plotting excitation spectra
oldpar <- par(mfrow=c(2,1))
matplot(t(amino[,1,]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence excitation spectra")
matplot(t(amino[,30,]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence excitation spectra")
par(oldpar)
## End(Not run)
Coefficient of factor congruence (phi)
Description
The function congruence(x, y)
computes the Tucker's
congruence (phi) coefficients among two sets of factors.
Usage
congruence(x, y = NULL)
Arguments
x |
A vector or matrix of factor loadings. |
y |
A vector or matrix of factor loadings (may be NULL). |
Details
Find the Tucker's coefficient of congruence between two sets of factor loadings. Factor congruences are the cosines of pairs of vectors defined by the loadings matrix and based at the origin. Thus, for loadings that differ only by a scaler (e.g. the size of the eigen value), the factor congruences will be 1.
For factor loading vectors of X and Y the measure of factor congruence, phi, is
\phi = \frac{\sum X Y}{\sqrt{\sum(X^2)\sum(Y^2)}}
.
If y=NULL
and x
is a numeric matrix, the congruence
coefficients between the columns of the matrix x
are returned.
The result is a symmetric matrix with ones on the diagonal. If two matrices
are provided, they must have the same size and the result is a square matrix containing the
congruence coefficients between all pairs of columns of the two matrices.
Value
A matrix of factor congruences.
Author(s)
Valentin Todorov, valentin.todorov@chello.at
References
L.R Tucker (1951). A method for synthesis of factor analysis studies. Personnel Research Section Report No. 984. Department of the Army, Washington, DC.
Examples
require(rrcov)
data(delivery, package="robustbase")
X <- getLoadings(PcaClassic(delivery))
Y <- getLoadings(PcaHubert(delivery, k=3))
round(congruence(X,Y),3)
Alternating Least Squares (ALS) for Candecomp/Parafac (CP)
Description
Alternating Least Squares (ALS) algorithm with optional constraints for the minimization of the Candecomp/Parafac (CP) loss function.
Usage
cp_als(
X,
n,
m,
p,
ncomp,
const = "none",
start = "random",
conv = 1e-06,
maxit = 10000,
trace = FALSE
)
Arguments
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
const |
Optional constraints for each mode. Can be a three element
character vector or a single character, one of |
start |
Initial values for the A, B and C components. Can be
|
conv |
Convergence criterion, default is |
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
Value
The result of the decomposition as a list with the following elements:
-
fit
Value of the loss function -
fp
Fit value expressed as a percentage -
ss
Sum of squares -
A
Component matrix for the A-mode -
B
Component matrix for the B-mode -
C
Component matrix for the C-mode -
iter
Number of iterations -
tripcos
Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy -
mintripcos
Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy -
ftiter
Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations -
const
Optional constraints (same as the input parameterconst
)
Note
The argument const
should be a three element character vector.
Set const[j]="none"
for unconstrained update in j-th mode weight
matrix (the default),
const[j]="orth"
for orthogonal update in j-th mode weight matrix,
const[j]="nonneg"
for non-negative constraint on j-th mode or
const[j]="zerocor"
for zero correlation between the extracted
factors.
The default is unconstrained update for all modes.
The loss function to be minimized is sum(k)|| X(k) - A D(k) B' ||^2
,
where D(k)
is a diagonal matrix holding the k
-th row of
C
.
Author(s)
Valentin Todorov, valentin.todorov@chello.at
References
Harshman, R.A. (1970). Foundations of Parafac procedure: models and conditions for an "explanatory" multi-mode factor analysis. UCLA Working Papers in Phonetics, 16: 1–84.
Harshman, R. A., & Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39–72.
Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice Hall, Englewood Cliffs, NJ.
Examples
## Not run:
## Example with the OECD data
data(elind)
dim(elind)
res <- cp_als(elind, ncomp=3)
res$fp
res$fp
res$iter
res <- cp_als(elind, ncomp=3, const="nonneg")
res$A
## End(Not run)
Alternating Trilinear Decomposition (ATLD) for Candecomp/Parafac (CP)
Description
Alternating Trilinear Decomposition algorithm for estimating the Candecomp/Parafac (CP) model. It is based on an alternating least squares principle and replaces the iterative procedure used in the traditional PARAFAC algorithm by an improved procedure. This improved procedure contains Moore-Penrose pseudoinverse computations based on singular value decomposition (SVD) which should be theoretically more robust to similarities in spectra and time profiles
Usage
cp_atld(
X,
n,
m,
p,
ncomp,
conv = 1e-06,
start = "random",
maxit = 5000,
trace = FALSE
)
Arguments
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
conv |
Convergence criterion, default is |
start |
Initial values for the A, B and C components. Can be
|
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
Value
The result of the decomposition as a list with the following elements:
-
fit
Value of the loss function -
fp
Fit value expressed as a percentage -
ss
Sum of squares -
A
Component matrix for the A-mode -
B
Component matrix for the B-mode -
C
Component matrix for the C-mode -
iter
Number of iterations -
tripcos
Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy -
mintripcos
Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy -
ftiter
Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations
Author(s)
Valentin Todorov, valentin.todorov@chello.at; Violetta Simonacci, violetta.simonacci@unina.it
References
H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.
Examples
## Not run:
## Example with the OECD data
data(elind)
dim(elind)
res <- cp_atld(elind, ncomp=3)
res$fp
res$fp
res$iter
## End(Not run)
Generate PARAFAC data sets, optionally with outliers
Description
Generates nsim
data sets according to the given parameters.
If eps > 0
, the specified fraction of random outliers of the identified by the
parameter type
type are added to the data sets.
Usage
cp_gen(
I = 20,
J = 20,
K = 20,
nsim = 200,
nf = 3,
noise = 0.05,
noise1 = 0,
Acol = TRUE,
Bcol = TRUE,
Ccol = TRUE,
congA = 0.5,
congB = 0.5,
congC = 0.5,
eps = 0,
type = c("none", "bl", "gl", "og"),
c1 = 10,
c2 = 0.1,
silent = FALSE
)
Arguments
I |
number of observations |
J |
number of variables |
K |
number of occasions |
nsim |
number of data sets to generate |
nf |
number of PARAFAC components |
noise |
level of homoscedastic (HO) noise |
noise1 |
level of heteroscedastic (HE) noise |
Acol |
whether to apply collinearity with factor congA to mode A |
Bcol |
whether to apply collinearity with factor congB to mode B |
Ccol |
whether to apply collinearity with factor congC to mode C |
congA |
collinearity factor for mode A |
congB |
collinearity factor for mode B |
congC |
collinearity factor for mode C |
eps |
fraction of outliers (percent contamination) |
type |
type of outliers: one of |
c1 |
parameter for outlier generation ( |
c2 |
parameter for outlier generation ( |
silent |
whether to issue warnings |
Value
A list consisting of the following lists:
As list of
nsim
matrices for the mode ABs list of
nsim
matrices for the mode BCs list of
nsim
matrices for the mode CXs list of
nsim
PARAFAC data sets, each with dimension IxJxKOs list of
nsim
vectors containing the added outliers (if any)param list of parameters used for generation of the data sets
References
Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.
Tomasi, G. and Bro, R., (2006). A comparison of algorithms for fitting the PARAFAC model. Computational Statistics & Data Analysis 50 (7), 1700–1734.
Faber, N.M. and Bro, R. and Hopke, P.K. (2003). Recent developments in CANDECOMP/PARAFAC algorithms: A critical review. Chemometrics and Intelligent Laboratory Systems 65, 119–137.
Examples
## Generate one PARAFAC data set (nsim=1) with R=2 components (nf=2) and dimensions
## 50x10x10. Apply 0.15 homoscedastic noise and 0.10 heteroscedastic noise, apply
## collinearity with congruence factor 0.5 to all modes. Add 20% bad leverage points.
library(rrcov3way)
xdat <- cp_gen(I=50, J=100, K=10, nsim=1, nf=2,
noise=0.15, noise1=0.10, Acol=TRUE, Bcol=TRUE, Ccol=TRUE,
congA=0.5, congB=0.5, congC=0.5,
eps=0.2, type="bl")
names(xdat)
ATLD-ALS algorithm for Candecomp/Parafac (CP)
Description
Integrated algorithm combining ATLD and ALS for the minimization of the Candecomp/Parafac (CP) loss function.
Usage
cp_int2(
X,
n,
m,
p,
ncomp,
initconv = 0.01,
conv = 1e-06,
const = "none",
start = "random",
maxit = 5000,
trace = FALSE
)
Arguments
X |
A three-way array or a matrix. If |
n |
Number of A-mode entities |
m |
Number of B-mode entities |
p |
Number of C-mode entities |
ncomp |
Number of components to extract |
initconv |
Convergence criterion for the initialization phase (ATLD),
default is |
conv |
Convergence criterion, default is |
const |
Optional constraints for each mode. Can be a three element
character vector or a single character, one of |
start |
Initial values for the A, B and C components. Can be
|
maxit |
Maximum number of iterations, default is |
trace |
Logical, provide trace output. |
Value
The result of the decomposition as a list with the following elements:
-
fit
Value of the loss function -
fp
Fit value expressed as a percentage -
ss
Sum of squares -
A
Component matrix for the A-mode -
B
Component matrix for the B-mode -
C
Component matrix for the C-mode -
iter
Number of iterations -
tripcos
Minimal triple cosine between two components across the three component matrices, used to inspect degeneracy -
mintripcos
Minimal triple cosine during the iterative algorithm observed at every 10 iterations, used to inspect degeneracy -
ftiter
Matrix containing in each row the function value and the minimal triple cosine at every 10 iterations -
const
Optional constraints (same as the input parameterconst
)
Note
The argument const
should be a three element character vector.
Set const[j]="none"
for unconstrained update in j-th mode weight
matrix (the default),
const[j]="orth"
for orthogonal update in j-th mode weight matrix or
const[j]="zerocor"
for zero correlation between the extracted
factors.
The default is unconstrained update for all modes.
The loss function to be minimized is sum(k)|| X(k) - A D(k) B' ||^2
,
where D(k)
is a diagonal matrix holding the k
-th row of
C
.
Author(s)
Valentin Todorov, valentin.todorov@chello.at; Violetta Simonacci, violetta.simonacci@unina.it
References
H.-L. Wu, M. Shibukawa, K. Oguma, An alternating trilinear decomposition algorithm with application to calibration of HPLC-DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons, Journal of Chemometrics 12 (1998) 1–26.
Simonacci, V. and Gallo, M. (2020). An ATLD–ALS method for the trilinear decomposition of large third-order tensors, Soft Computing 24 13535–13546.
Todorov, V. and Simonacci, V. and Gallo, M. and Trendafilov, N. (2023). A novel estimation procedure for robust CANDECOMP/PARAFAC model fitting. Econometrics and Statistics. In press.
Examples
## Not run:
## Example with the OECD data
data(elind)
dim(elind)
res <- cp_int2(elind, ncomp=3)
res$fp
res$fp
res$iter
res <- cp_int2(elind, ncomp=3, const="orth")
res$A
## End(Not run)
Postprocessing: renormalization, reflection and reordering; access to some of the components of the model.
Description
The estimated model will be renormalized, reflected (change of sign) or the components will be reordered. Functions that provide access to some components of the model: coordinates, weights.
Usage
## S3 method for class 'tucker3'
do3Postprocess(x, reflectA, reflectB, reflectC, reorderA, reorderB, reorderC, ...)
## S3 method for class 'parafac'
do3Postprocess(x, reflectA, reflectB, reflectC, reorder, ...)
## S3 method for class 'parafac'
coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...)
## S3 method for class 'tucker3'
coordinates(x, mode = c("A", "B", "C"), type = c("normalized", "unit", "principal"), ...)
## S3 method for class 'parafac'
weights(object, ...)
## S3 method for class 'tucker3'
weights(object, mode = c("A", "B", "C"), ...)
## S3 method for class 'parafac'
reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)
## S3 method for class 'tucker3'
reflect(x, mode = c("A", "B", "C"), rsign = 1, ...)
Arguments
x |
Tucker3 or Parafac solution |
object |
Tucker3 or Parafac solution (alternative of |
reflectA |
How to handle the signs of the components of mode A - can be a single number or a vector with length of the number of components of A |
reflectB |
How to handle the signs of the components of mode B - can be a single number or a vector with length of the number of components of B |
reflectC |
How to handle the signs of the components of mode C - can be a single number or a vector with length of the number of components of C |
reorder |
How to reorder the components of a Parafac solution - a vector with length of the number of components |
reorderA |
How to reorder the components of mode A - a vector with length of the number of components of A giving the new order |
reorderB |
How to reorder the components of mode B - a vector with length of the number of components of B giving the new order |
reorderC |
How to reorder the components of mode C - a vector with length of the number of components of C giving the new order |
mode |
For which mode to provide the coordinates or weights. Default is mode A |
type |
Which type of coordinates to provide. Default is "normalized" |
rsign |
How to change the sign of the components of the given mode. Can be a single number or a vector with length of the number of components of the corresponding mode. |
... |
Potential further arguments passed to lower level functions. |
Value
The output value of do3Postproces() is the postprocessed solution, Parafac or Tucker3.
The output of weights()
and coordinates()
are the respective values.
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.
Examples
data(elind)
x1 <- do3Scale(elind, center=TRUE, scale=TRUE)
(cp <- Parafac(x1, ncomp=3, const=c("orth", "none", "none")))
cp$B
cp1 <- do3Postprocess(cp, reflectB=-1) # change the sign of all components of B
cp1$B
weights(cp1)
coordinates(cp1)
coordinates(cp1, type="principal")
## Same as above - the centering and scaling is done inside the Parafac procedure
(cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"),
center=TRUE, scale=TRUE))
## Robust estimation with robust scaling with median and mad
(cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"),
center=median, scale=mad, robust=TRUE))
## Robust estimation with robust scaling with median and Qn (Rousseeuw and Croux, 1993)
require(robustbase)
(cp1 <- Parafac(elind, ncomp=3, const=c("orth", "none", "none"),
center=median, scale=Qn, robust=TRUE))
Varimax Rotation for Tucker3 models
Description
Computes varimax rotation of the core and component matrix of a Tucker3 model to simple structure.
Usage
do3Rotate(x, ...)
## S3 method for class 'tucker3'
do3Rotate(x, weights = c(0, 0, 0), rotate = c("A", "B", "C"), ...)
Arguments
x |
A Tucker 3 object |
... |
Potential further arguments passed to called functions. |
weights |
A numeric vector with length 3: relative weights (greater or
equal 0) for the simplicity of the component matrices |
rotate |
Within which mode to rotate the Tucker3 solution:
|
Value
A list including the following components:
Author(s)
Valentin Todorov, valentin.todorov@chello.at
Examples
## Rotation of a Tucker3 solution
data(elind)
(t3 <- Tucker3(elind, 3, 2, 2))
xout <- do3Rotate(t3, c(3, 3, 3), rotate=c("A", "B", "C"))
xout$vvalue
Centering and scaling
Description
Centering and/or normalization of a three way array or a matricized array across one mode (modes indicated by "A", "B" or "C").
Usage
## S3 method for class 'tucker3'
do3Scale(x, renorm.mode = c("A", "B", "C"), ...)
## S3 method for class 'parafac'
do3Scale(x, renorm.mode = c("A", "B", "C"), ...)
## Default S3 method:
do3Scale(x, center = FALSE, scale = FALSE,
center.mode = c("A", "B", "C", "AB", "AC", "BC", "ABC"),
scale.mode = c("B", "A", "C"),
only.data=TRUE, ...)
Arguments
x |
Three dimensional array of order (I x J x K) or a matrix (or data.frame coerced to a matrix) of order (I x JK) containing the matricized array (frontal slices) |
center |
Whether and how to center the data. Can be |
scale |
Whether and how to scale the data. Can be |
center.mode |
Across which mode to center. Default is |
scale.mode |
Within which mode to scale. Default is |
renorm.mode |
Within which mode to renormalize a Parafac or Tucker3 solution.
See in Details how this is performed for the different models.
Default is |
only.data |
Whether to return only the centered/scaled data or also
the center and the scale themselves. Default is |
... |
potential further arguments passed to lower level functions. |
Value
A named list, consisting of the centered and/or scaled data, a center vector, a scale vector and the mode in which the data were centered/scaled.
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
Kiers, H.A.L. (2000).Towards a standardizrd notation and terminology in multiway analysis. Journal of Chemometrics, 14:105-122.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
Examples
data(elind)
(x1 <- do3Scale(elind, center=TRUE, scale=TRUE))
(x2 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="B"))
(x3 <- do3Scale(elind, center=TRUE, scale=TRUE, center.mode="C", scale.mode="C"))
Dorrit fluorescence data.
Description
A data set with 27 synthetic samples containing different concentrations of four analytes (hydroquinone, tryptophan, phenylalanine and dopa) measured in a Perkin-Elmer LS50 B fluorescence spectrometer.
Usage
data(dorrit)
Format
A three-way array with dimension 27 x 116 x 18
.
The first dimension refers to the 27 samples. The second dimension
refers to the emission measurements (251-481nm, 2nm intervals).
The third dimension refers to the excitation (230-315nm, 5nm intervals).
Details
Each fluorescence landscape corresponding to each sample in the original data set consists of 233 emission wavelengths (250-482 nm) and 24 excitation wavelengths (200-315 nm taken each 5 nm). The fluorescence data is three-way. Ideally, the data is trilinear, the components of the modes corresponding to concentrations (27), emission spectra (233) and excitation spectra (24). Hence a three-way PARAFAC model should be capable of uniquely and meaningfully describing the variation in the data set.
The data set is modified as described in Engelen and Hubert (2011):
the emission wavelengths are taken at 2 nm,
noisy parts situated at the excitation wavelengths from 200 to 230 nm and
at emission wavelengths below 250 nm are excluded. The severe Rayleigh
scattering areas present in all samples are replaced by interpolated values.
Thus we end up with a (27 x 116 x 18)
data array.
Source
https://ucphchemometrics.com/datasets/.
References
Bro, R, Sidiropoulos, ND and Smilde, AK (2002). Maximum likelihood fitting using ordinary least squares algorithms. Journal of Chemometrics, 16(8–10), 387–400.
Riu, J and Bro, R (2003) Jack-knife estimation of standard errors and outlier detection in PARAFAC models. Chemometrics and Intelligent Laboratory Systems, 65(1), 35–49
Engelen, S and Hubert, M (2011) Detecting outlying samples in a parallel factor analysis model, Analytica Chemica Acta 705 155–165.
Baunsgaard, D (1999) Factors Affecting 3-way Modelling (PARAFAC) of Fluorescence Landscapes, Royal Veterinary and Agricultural University, Department of Dairy and Food Science, Frederiksberg, Denmark.
Examples
## Not run:
data(dorrit)
## Plotting Emission spectra
oldpar <- par(mfrow=c(2,1))
matplot(t(dorrit[,,1]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence emission spectra")
matplot(t(dorrit[,,5]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence emission spectra")
par(oldpar)
## Plotting excitation spectra
oldpar <- par(mfrow=c(2,1))
matplot(t(dorrit[,1,]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence excitation spectra")
matplot(t(dorrit[,30,]), type="l",
xlab="Wavelength/nm", ylab="Intensity",
main="Fluorescence excitation spectra")
par(oldpar)
persp(as.numeric(dimnames(dorrit)[[2]]),
as.numeric(dimnames(dorrit)[[3]]), dorrit[4,,],
xlab="Emission", ylab="Excitation", zlab="Intensity",
theta = 30, phi = 30, expand = 0.5, col = "lightblue",
ticktype="detailed")
pp <- Parafac(dorrit, ncomp=4, robust=TRUE)
plot(pp)
## End(Not run)
OECD Electronics Industries Data
Description
OECD publishes comparative statistics of the export size of various sectors of the electronics industry:
information science,
telecommunication products,
radio and television equipment,
components and parts,
electromedical equipment, and
scientific equipment.
The data consist of specialisation indices of electronics industries of 23 European countries for the years 1973–1979. The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion for the world as a whole (see D'Ambra, 1985, p. 249 and Kroonenberg, 2008, p.282).
Usage
data(elind)
Format
A three-way array with dimension 23x6x7. The first dimension refers to 23 countries. The second dimension refers to the six indices of electronics industries. The third dimension refers to the years in the period 1978–1985.
Source
The data set is available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/electronicindustriesinfo.htm"
References
D'Ambra, L. (1985). Alcune estensione dell'analisi in componenti principali per lo studio dei sistemi evolutivi. Uno studio sul commercio internazionale dell'elettronica. In: Ricerche Economiche. 2. del Dipartimento di Scienze Economiche Ca'Foscari, Venezia.
Kroonenberg PM (2008). Applied multiway data analysis. Wiley series in probability and statistics. John Wiley and Sons, Hoboken, NJ, p.282.
Examples
data(elind)
res <- Parafac(elind, robust=FALSE, coda.transform="none")
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Per-component plot
plot(res, which="percomp", comp=1, main="Per component plot")
## all components plot
plot(res, which="allcomp", main="All components plot", legend.position="topright")
Sempe girls' growth curves data
Description
Thirty girls selected from a French auxiological study (1953-1975) to get insight into the physical growth patterns of children from ages four to fifteen, Sempe (1987). They were measured yearly between the ages 4 and 15 on the following eight variables:
weight = Weight
length = Length
crump = Crown-rump length
head = Head circumference
chest = Chest circumference
arm = Arm
calf = Calf
pelvis = Pelvis
The data set is three way data array of size 30 (girls) x 8 (variables) x 12 (years).
Usage
data("girls")
Format
The format is a three way array with the following dimensions: The first dimension refers to 30 girls. The second dimension refers to the eight variables measured on the girls. The third dimension refers to the years – 4 to 15.
Details
The data are generally preprocessed as standard multiway profile data. For details see Kroonenberg (2008), Chapters 6 and 15.
Source
The data sets are available from Pieter Kroonenberg's web site at: "three-mode.leidenuniv.nl/data/girlsgrowthcurvesinfo.htm"
References
Sempe, M. (1987). Multivariate and longitudinal data on growing children: Presentation of the French auxiological survey. In J.Janssen et al. Data analysis. The Ins and Outs of solving real problems (pp. 3-6). New York: Plenum Press.
Kroonenberg (2008). Applied multiway data analysis. Wiley series in probability and statistics. Hoboken NJ, Wiley.
Examples
data(girls)
str(girls)
## Center the data in mode A and find the "average girl"
center.girls <- do3Scale(girls, center=TRUE, only.data=FALSE)
X <- center.girls$x
center <- center.girls$center
average.girl <- as.data.frame(matrix(center, ncol=8, byrow=TRUE))
dimnames(average.girl) <- list(dimnames(X)[[3]], dimnames(X)[[2]])
## Divide these variables by 10 to reduce their range
average.girl$weight <- average.girl$weight/10
average.girl$length <- average.girl$length/10
average.girl$crrump <- average.girl$crrump/10
average.girl
p <- ncol(average.girl)
plot(rownames(average.girl), average.girl[,1], ylim=c(min(average.girl),
max(average.girl)), type="n", xlab="Age", ylab="")
for(i in 1: p)
{
lines(rownames(average.girl), average.girl[,i], lty=i, col=i)
points(rownames(average.girl), average.girl[,i], pch=i, col=i)
}
legend <- colnames(average.girl)
legend[1] <- paste0(legend[1], "*")
legend[2] <- paste0(legend[3], "*")
legend[3] <- paste0(legend[4], "*")
legend("topleft", legend=legend, col=1:p, lty=1:p, pch=1:p)
The Khatri-Rao product of two matrices
Description
The function krp(A,B)
returns the Khatri-Rao product of two matrices A
and B
, of
dimensions I x K and J x K respectively. The result is an IJ x K matrix formed by the matching
column-wise Kronecker products, i.e. the k-th column of the Khatri-Rao product is
defined as kronecker(A[, k], B[, k])
.
Usage
krp(A, B)
Arguments
A |
Matrix of order I x K. |
B |
Matrix of order J x K. |
Value
The IJ x K matrix of columnwise Kronecker products.
Author(s)
Valentin Todorov, valentin.todorov@chello.at
References
Khatri, C. G., and Rao, C. Radhakrishna (1968). Solutions to Some Functional Equations and Their Applications to Characterization of Probability Distributions. Sankhya: Indian J. Statistics, Series A 30, 167-180.
Smilde, A., Bro R. and Gelardi, P. (2004). Multi-way Analysis: Applications in Chemical Sciences, Chichester:Wiley
Examples
a <- matrix(1:12, 3, 4)
b <- diag(1:4)
krp(a, b)
krp(b, a)
The trace of a square numeric matrix
Description
Computes the trace of a square numeric matrix. If A
is not numeric and square matrix,
the function terminates with an error message.
Usage
mtrace(A)
Arguments
A |
A square numeric matrix. |
Value
the sum of the values on the diagonal of the matrix A
, i.e. sum(diag(A))
.
Author(s)
Valentin Todorov, valentin.todorov@chello.at
Examples
(a <- matrix(c(5,2,3, 4,-3,7, 4,1,2), ncol=3))
(b <- matrix(c(1,0,1, 0,1,2, 1,0,3), ncol=3))
mtrace(a)
mtrace(b)
## tr(A+B)=tr(A)+tr(B)
all.equal(mtrace(a) + mtrace(b), mtrace(a+b))
## tr(A)=tr(A')
all.equal(mtrace(a), mtrace(t(a)))
## tr(alphA)=alphatr(A)
alpha <- 0.5
all.equal(mtrace(alpha*a), alpha*mtrace(a))
## tr(AB)=tr(BA)
all.equal(mtrace(a %*% b), mtrace(b %*% a))
## tr(A)=tr(BAB-1)
all.equal(mtrace(a), mtrace(b %*% a %*% solve(b)))
Permutation of a matricized array
Description
Permutes the matricized (n
x
m
x
p
) array X
to the
matricized array Y
of order (m
x
p
x
n
).
Usage
permute(X,n,m,p)
Arguments
X |
Matrix (or data.frame coerced to a matrix) containing the matricized array |
n |
Number of |
m |
Number of |
p |
Number of |
Value
Y |
Matrix containing the permuted matricized array |
References
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.
Examples
X <- array(1:24, c(4,3,2))
dim(X)
## matricize the array
Xa <- unfold(X) # matricized X with the A-mode entities in its rows
dim(Xa)
Xa
## matricized X with the B-mode entities in its rows
Xb <- permute(Xa, 4, 3, 2)
dim(Xb)
Xb
## matricized X with the C-mode entities in its rows
Xc <- permute(Xb, 3, 2, 4)
dim(Xc)
Xc
Plot a parafac or a tucker3 object
Description
Different plots for the results of Parafac or Tucker3 analysis, stored in a
Parafac
or a tucker3
object, see Details.
Usage
## S3 method for class 'tucker3'
plot(x, which = c("dd", "comp", "allcomp", "jbplot",
"tjplot", "all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)
## S3 method for class 'parafac'
plot(x, which = c("dd", "comp", "percomp", "allcomp",
"all"), ask = (which == "all" && dev.interactive(TRUE)), id.n, ...)
Arguments
x |
A |
which |
Which plot to select (see Details). Default is |
ask |
Generates all plots in interactive mode |
id.n |
Number of items to highlight |
... |
Other parameters to be passed to the lower level functions. |
Details
Different plots for a tucker3
or parafac
object will be produced. Use the parameter
which
to select which plot to produce:
dd
Distance-distance plot
comp
Paired components plot
percomp
Per-component plot - only for Parafac
allcomp
All components plot
jbplot
Joint biplot - only for Tucker3
tjplot
Trajectory plot - only for Tucker3
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
Kiers, H.A. (2000).Some procedures for displaying results from three-way methods. Journal of Chemometrics. 14(3): 151-170.
Kroonenberg, P.M. (1983).Three-mode principal component analysis: Theory and applications (Vol. 2), DSWO press.
Examples
#############
##
## Example with the UNIDO Manufacturing value added data
data(va3way)
dim(va3way)
## Treat quickly and dirty the zeros in the data set (if any)
va3way[va3way==0] <- 0.001
##
res <- Tucker3(va3way)
res
print(res$fit)
print(res$A)
## Print the core matrix
print(res$GA)
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Joint biplot
plot(res, which="jbplot", main="Joint biplot")
## Trajectory
plot(res, which="tjplot", choices=c(1:4), arrows=FALSE, main="Trajectory biplot")
Student satisfaction data
Description
The primary assessment tool for analyzing student satisfaction is an
annual questionnaire organized into five categories: degree program, course
characteristics, teaching, equipment, and overall satisfaction.
Each questionnaire comprises 18 standardized questions rated on a ten-point
Likert scale (1 = "Strongly Disagree" to 10 = "Strongly Agree"). The
original microdata were aggregated by 10 faculties and six academic years.
Thus, we have a threeway array with dimensions 10 x 18 x 6
.
Usage
data(satisfaction)
Format
A three-way array with dimension 10 x 18 x 6
.
The first dimension refers to the 10 faculties. The second dimension
refers to 18 standardized questions rated on a ten-point Likert scale
(1 = "Strongly Disagree" to 10 = "Strongly Agree").
The third dimension refers to six consequtive academic years (2012–2017).
Source
Italian universities are mandated by Law No. 370/99 to evaluate teaching quality through student opinion surveys. The National Agency for the Evaluation of Universities and Research (ANVUR), established in 2006, supervises the periodic assessment of academic quality. Following Presidential Decree 76/2010, ANVUR standardized methodologies for evaluating institutions and degree programs, with a strong focus on student involvement. The University of Florence provided microdata, which were later aggregated into data for 10 faculties, 18 questions and 6 academic years.
Simonacci V, Gallo M (2017) Statistical tools for student evaluation of academic educational quality. Quality & Quantity 51(2):565–579
References
Gallo M, Simonacci V, Todorov V (2021) A compositional three-way approach for student satisfaction analysis. In: Filzmoser P, Hron K, Martin–Fernandez JA, Palarea-Albaladejo J (eds) Advances in Compositional Data Analysis, Springer, Cham, pp 143–162
Todorov, V., Simonacci, V., Gallo, M., and Di Palma, M. (2025). Robust tools for three-way component analysis of compositional data: The R package rrcov3way. Behaviormetrika. In press.
Examples
data(satisfaction)
t3 <- Tucker3(satisfaction, P=2, Q=2, R=1, coda.transform="clr")
t3
t3$A
t3$B
t3$C
t3$G
plot(t3)
plot(t3, which="jbplot", xlim=c(-1, 1))
Matrix to array conversion
Description
Restore an array from its matricization with all the frontal slices of the array next to each other (mode="A")
Usage
toArray(x, n, m, r, mode = c("A", "B", "C"))
Arguments
x |
Matrix (or data.frame coerced to a matrix) containing the elements of the frontal slices of an array |
n |
number of A-mode elements |
m |
number of B-mode elements |
r |
number of C-mode elements |
mode |
in which mode is the matricized array |
Value
Three way array
Author(s)
Valentin Todorov valentin.todorov@chello.at and Maria Anna Di Palma madipalma@unior.it and Michele Gallo mgallo@unior.it
References
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14: 105–122.
Examples
data(elind)
di <- dim(elind)
toArray(unfold(elind), di[1], di[2], di[3])
Undeclared labor by region in Italy
Description
The dataset contains the undeclared labor in thousands work units. The data originate from Italy and are recorded at a regional level over a certain time horizon for five macroeconomic activities defined according to NACE Rev. 1.1 classification.
Usage
data("ulabor")
Format
A three-way array with dimension 22x5x5. The first dimension refers to 22 regions in Italy. The second dimension refers to the 5 economic activities. The third dimension refers to the years in the period 2001-2009.
Source
ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.
References
ISTAT (2011). Note metodologiche, la misura dell'occupazione non regolare nelle stime di contabilita nazionale [online]. Roma.
Di Palma M.A., Filzmoser P., Gallo M. and Hron, K. (2016). A robust CP model for compositional data, submitted.
Examples
data(ulabor)
dim(ulabor)
str(ulabor)
## Plot robust and non-robust DD-plots of the ilr-transformed data
usr <- par(mfrow=c(1,2))
res1 <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
res2 <- Parafac(ulabor, coda.transform="ilr")
plot(res1)
plot(res2)
par(usr)
## Not run:
## Plot Orthonormalized A-mode component plot
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
plot(res, which="comp", mode="A", main="Component plot, A-mode")
## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="B", main="Component plot, B-mode")
## Plot Orthonormalized B-mode component plot
plot(res, which="comp", mode="C", main="Component plot, C-mode")
## Per component plot
## adapted for the example and only for robust, ilr transformed model
##
##
res <- Parafac(ulabor, robust=TRUE, coda.transform="ilr")
plot(res, which="percomp") # component 1
plot(res, which="percomp", comp=2) # component 2
## End(Not run)
Matrix unfolding
Description
Conducts matricizations of a three-way array into matrices according to the selected mode.
Usage
unfold(x, mode=c("A", "B", "C"))
Arguments
x |
Array to be unfolded |
mode |
the selected mode for unfolding |
Value
A matrix represnting the input array, according to the selected mode:
Mode=A:
B
-mode entities are nested withinC
-mode entities (all the frontal slices of the array next to each other) item Mode=B:C
-mode entities nested withinA
-mode entities (all the horizontal slices of the array next to each other) item Mode C:A
-mode entities nested withinB
-mode entities (all the lateral slices of the array next to each other)
Author(s)
Valentin Todorov valentin.todorov@chello.at
References
H.A.L. Kiers (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14:105–122.
Examples
(X <- array(1:24, c(4,3,2)))
dim(X)
## matricize the array
## matricized X with the A-mode entities in its rows
## all the frontal slices of the array next to each other
##
(Xa <- unfold(X))
dim(Xa)
## matricized X with the B-mode entities in its rows
## all the horizontal slices of the array next to each other
##
(Xb <- unfold(X, mode="B"))
dim(Xb)
## matricized X with the C-mode entities in its rows
## all the lateral slices of the array next to each other
##
(Xc <- unfold(X, mode="C"))
dim(Xc)
Manufacturing value added by technology intensity for several years
Description
A three-way array containing manufacturing value added by technology intensity for 55 countries in the period 2000–2010. UNIDO maintains a unique database containing key industrial statistics indicators for more than 160 countries in the world in the period 1963-2011: INDSTAT 2, available at https://stat.unido.org. The data are organized according to the International Standard Industrial Classification of all economic activities (ISIC) Revision 3 at 2-digit level. The present data set was created by aggregating the 23 2-digit divisions into five groups according to technology intensity, using the UNIDO derived classification (Upadhyaya, 2011). Then 55 countries were selected which have relatively complete data in the period 2000–2010.
Usage
data(va3way)
Format
A three-way array with dimension 55x5x11. The first dimension refers to 55 countries. The second dimension refers to the five categories of technology intensity described above. The third dimension refers to the years in the period 2000–2010.
Details
Note that the values in the second mode (sectors) sum up to a constant - the total manufacturing value added of a country in a given year and thus the data set has a compositional character.
Source
References
Upahdyaya S (2011). Derived classifications for industrial performance indicators. In Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS022).
Upadhyaya S, Todorov V (2008). UNIDO Data Quality. UNIDO Staff Working Paper, Vienna.
Examples
data(va3way)
ct <- 2
x <- va3way[ct,,]/1000000
plot(colnames(x), x[1,], ylim=c(min(x), max(x)), type="n", ylab="Manufacturing Value
Added in million USD", xlab="Years")
for(i in 1:nrow(x))
lines(colnames(x), x[i,], col=i)
legend("topleft", legend=rownames(x), col=1:nrow(x), lwd=1)
title(paste("Coutnry: ", rownames(va3way[,,1])[ct]))
## Treat quickly and dirty the zeros in the data set (if any)
## in order to be able to perform ilr transformation:
va3way[va3way==0] <- 0.001
res <- Tucker3(va3way)
##
## Not yet a print function
##
print(res$fit)
print(res$A)
## Print the core matrix
print(res$GA)
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Joint biplot
plot(res, which="jbplot", main="Joint biplot")
## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")
Water quality data in Wyoming, USA
Description
Water quality data for three years of seasonal compositional groundwater chemistry data for 14 wells at a study site in Wyoming, USA. Routine water quality monitoring typically involves measurement of J parameters and constituents measured at I number of static locations at K sets of seasonal occurrences.
Usage
data("waterquality")
Format
A three-way array with dimension 14x12x10. The first dimension refers to 14 wells at a study site in Wyoming, USA. The second dimension refers to the ten most reactive and indicative dissolved constituents at the site: B, Ba, Ca, Cl, K, Mg, Na, Si, Sr, and SO4. In addition, the concentration of water in each sample was calculated. The third dimension refers to the time of collection - ten occasions.
References
Engle, M.A., Gallo, M., Schroeder, K.T., Geboy, N.J., Zupancic, J.W., (2014). Three-way compositional analysis of water quality monitoring data. Environmental and Ecological Statistics, 21(3):565-581.
Examples
data(waterquality)
dim(waterquality) # [1] 14 12 10
dim(waterquality[,,1]) # [1] 14 12
rownames(waterquality[,,1]) # the 14 wells
colnames(waterquality[,,1]) # the 12 chemical compositions
dim(waterquality[,1,]) # [1] 14 10
colnames(waterquality[,1,]) # the ten occasions
(res <- Tucker3(waterquality, robust=FALSE, coda.transform="ilr"))
## Distance-distance plot
plot(res, which="dd", main="Distance-distance plot")
## Paired component plot, mode A
plot(res, which="comp", main="Paired component plot (mode A)")
## Paired component plot, mode B
plot(res, which="comp", mode="B", main="Paired component plot (mode B)")
## Joint biplot
plot(res, which="jbplot", main="Joint biplot")
## Trajectory
plot(res, which="tjplot", main="Trajectory biplot")