Introduction to Archetypal Package

Demetris T. Christopoulos

17/12/2019

Archetypal Analysis (AA)

Archetypal Analysis (AA) is different from Cluster Analysis (CA) because it focus on the extreme points that usually are closer to the Convex Hull (CH) and not inside the cloud of points as in CA or other centroid analyses.

Here we implement a view which is common in Econometrics. The usual data frame “df” is a matrix Y with dimension \(n \times d\), where n is the number or rows–observations and d is the number of variables or the dimension of the relevant Linear Space \(R^n\).

The output of our AA algorithm gives matrices A, B such that: \[ Y\sim\,A\,B\,Y \] or if we want to be more strict our \(kappas\times\,d\) matrix of archetypes \(B\,Y\) is such that next squared Frobenius norm is minimum \[ SSE=\|Y-A\,B\,Y \|^2 = minimum \] A (\(n\times\,kappas\)) and B (\(kappas\times\,n\) ) are row stochastic matrices. We also define the Variance Explained as next: \[ varexpl=\frac{SST-SSE}{SST} \] with SST the total sum of squares for elements of Y.

It is a suitable modification of PCHA algorithm, see [1], [2], which uses data frames without transposing them and has full control to all external and internal parameters of it.

Compute AA for a data frame

Lets first create some 2D points that will certainly be a convex combination of three outer points:

library(archetypal)
p1=c(1,2);p2=c(3,5);p3=c(7,3) 
dp=rbind(p1,p2,p3);dp
##    [,1] [,2]
## p1    1    2
## p2    3    5
## p3    7    3
set.seed(9102)
pts=t(sapply(1:100, function(i,dp){
  cc=runif(3)
  cc=cc/sum(cc)
  colSums(dp*cc)
},dp))
df=data.frame(pts)
colnames(df)=c("x","y")
head(df)
##          x        y
## 1 3.642835 3.717916
## 2 3.581289 2.927728
## 3 2.723810 3.688769
## 4 3.128537 3.733146
## 5 5.123878 2.745874
## 6 3.719885 3.449477

Data frame dp is the three points of the outer triangle and df is our data set for AA, lets plot all:

The above mentioned data frame “df” can be loaded as:

# data("wd2")
# df=wd2

Since number of archetypes is kappas=3 due to the construction we run “archetypal” function with three archetypes:

aa = archetypal(df = df, kappas = 3, verbose = TRUE, rseed = 9102, save_history = TRUE)
## Time for computing Projected Convex Hull was 0.01 secs
## Next projected convex hull initial solution will be used... 
##           x        y
## 34 5.687791 3.481611
## 62 1.961799 2.793497
## 5  5.123878 2.745874
##   
## Time for the 25 initial A updates was 0.3 secs 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   Iter|   VarExpl|          SSE | |dSSE|/SSE |       muB|       muA| t(sec)|Aup;dwn|Bup;dwn 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##     1 | 0.994653 | 1.347706e+01 |   9.12e-01 | 1.55e+00 | 4.61e+00 |   0.1 |  10;2 |  10;2 
##     2 | 0.995609 | 1.106752e+01 |   2.18e-01 | 9.58e+00 | 7.14e+00 |   0.2 |  10;2 |  10;0 
##     3 | 0.995975 | 1.014499e+01 |   9.09e-02 | 1.48e+01 | 5.53e+00 |   0.3 |  10;3 |  10;2 
##     4 | 0.996104 | 9.818920e+00 |   3.32e-02 | 4.59e+01 | 4.28e+00 |   0.2 |  10;3 |  10;1 
##     5 | 0.996177 | 9.636195e+00 |   1.90e-02 | 3.55e+01 | 6.62e+00 |   0.2 |  10;2 |  10;3 
##     6 | 0.996237 | 9.484944e+00 |   1.59e-02 | 5.50e+01 | 5.13e+00 |   0.2 |  10;3 |  10;2 
##     7 | 0.996292 | 9.345900e+00 |   1.49e-02 | 4.26e+01 | 3.97e+00 |   0.3 |  10;3 |  10;3 
##     8 | 0.996350 | 9.200281e+00 |   1.58e-02 | 6.59e+01 | 6.14e+00 |   0.2 |  10;2 |  10;2 
##     9 | 0.996399 | 9.076371e+00 |   1.37e-02 | 5.10e+01 | 4.75e+00 |   0.2 |  10;3 |  10;3 
##    10 | 0.996455 | 8.936348e+00 |   1.57e-02 | 7.90e+01 | 7.36e+00 |   0.2 |  10;2 |  10;2 
##    11 | 0.996509 | 8.798872e+00 |   1.56e-02 | 1.22e+02 | 5.69e+00 |   0.2 |  10;3 |  10;2 
##    12 | 0.996576 | 8.630899e+00 |   1.95e-02 | 1.89e+02 | 4.41e+00 |   0.2 |  10;3 |  10;2 
##    13 | 0.996659 | 8.421391e+00 |   2.49e-02 | 1.46e+02 | 3.41e+00 |   0.2 |  10;3 |  10;3 
##    14 | 0.996746 | 8.201882e+00 |   2.68e-02 | 5.67e+01 | 2.64e+00 |   0.2 |  10;3 |  10;4 
##    15 | 0.996831 | 7.988484e+00 |   2.67e-02 | 8.77e+01 | 4.09e+00 |   0.2 |  10;2 |  10;2 
##    16 | 0.996903 | 7.805609e+00 |   2.34e-02 | 1.36e+02 | 3.16e+00 |   0.2 |  10;3 |  10;2 
##    17 | 0.996988 | 7.593019e+00 |   2.80e-02 | 1.05e+02 | 4.90e+00 |   0.2 |  10;2 |  10;3 
##    18 | 0.997093 | 7.327788e+00 |   3.62e-02 | 1.63e+02 | 3.79e+00 |   0.2 |  10;3 |  10;2 
##    19 | 0.997217 | 7.014292e+00 |   4.47e-02 | 1.26e+02 | 5.87e+00 |   0.2 |  10;2 |  10;3 
##    20 | 0.997361 | 6.652245e+00 |   5.44e-02 | 9.75e+01 | 4.54e+00 |   0.2 |  10;3 |  10;3 
##    21 | 0.997530 | 6.226290e+00 |   6.84e-02 | 3.77e+01 | 3.51e+00 |   0.2 |  10;3 |  10;4 
##    22 | 0.997782 | 5.591084e+00 |   1.14e-01 | 2.34e+02 | 5.44e+00 |   0.2 |  10;2 |  10;0 
##    23 | 0.998053 | 4.907894e+00 |   1.39e-01 | 1.81e+02 | 4.21e+00 |   0.2 |  10;3 |  10;3 
##    24 | 0.998337 | 4.190972e+00 |   1.71e-01 | 7.00e+01 | 6.52e+00 |   0.2 |  10;2 |  10;4 
##    25 | 0.998534 | 3.695589e+00 |   1.34e-01 | 6.77e+00 | 5.04e+00 |   0.3 |  10;3 |  10;6 
##    26 | 0.998723 | 3.218836e+00 |   1.48e-01 | 4.19e+01 | 3.90e+00 |   0.2 |  10;3 |  10;0 
##    27 | 0.998941 | 2.670186e+00 |   2.05e-01 | 3.24e+01 | 6.04e+00 |   0.2 |  10;2 |  10;3 
##    28 | 0.999116 | 2.228988e+00 |   1.98e-01 | 5.02e+01 | 4.68e+00 |   0.3 |  10;3 |  10;2 
##    29 | 0.999200 | 2.017297e+00 |   1.05e-01 | 9.71e+00 | 3.62e+00 |   0.3 |  10;3 |  10;5 
##    30 | 0.999278 | 1.818571e+00 |   1.09e-01 | 6.01e+01 | 5.60e+00 |   0.2 |  10;2 |  10;0 
##    31 | 0.999303 | 1.757344e+00 |   3.48e-02 | 4.65e+01 | 4.34e+00 |   0.2 |  10;3 |  10;3 
##    32 | 0.999307 | 1.747236e+00 |   5.79e-03 | 7.21e+01 | 3.36e+00 |   0.2 |  10;3 |  10;2 
##    33 | 0.999310 | 1.740097e+00 |   4.10e-03 | 1.12e+02 | 5.20e+00 |   0.2 |  10;2 |  10;2 
##    34 | 0.999313 | 1.731885e+00 |   4.74e-03 | 1.73e+02 | 4.02e+00 |   0.2 |  10;3 |  10;2 
##    35 | 0.999315 | 1.725400e+00 |   3.76e-03 | 1.34e+02 | 3.11e+00 |   0.2 |  10;3 |  10;3 
##    36 | 0.999316 | 1.723269e+00 |   1.24e-03 | 1.03e+02 | 4.82e+00 |   0.2 |  10;2 |  10;3 
##    37 | 0.999317 | 1.721807e+00 |   8.49e-04 | 1.60e+02 | 3.73e+00 |   0.2 |  10;3 |  10;2 
##    38 | 0.999317 | 1.720818e+00 |   5.75e-04 | 2.48e+02 | 5.77e+00 |   0.3 |  10;2 |  10;2 
##    39 | 0.999318 | 1.720097e+00 |   4.19e-04 | 3.84e+02 | 4.47e+00 |   0.2 |  10;3 |  10;2 
##    40 | 0.999318 | 1.719562e+00 |   3.11e-04 | 2.97e+02 | 3.46e+00 |   0.2 |  10;3 |  10;3 
##    41 | 0.999318 | 1.719134e+00 |   2.49e-04 | 4.60e+02 | 5.35e+00 |   0.2 |  10;2 |  10;2 
##    42 | 0.999318 | 1.718793e+00 |   1.98e-04 | 7.11e+02 | 4.14e+00 |   0.3 |  10;3 |  10;2 
##    43 | 0.999318 | 1.718523e+00 |   1.57e-04 | 5.51e+02 | 3.21e+00 |   0.2 |  10;3 |  10;3 
##    44 | 0.999318 | 1.718305e+00 |   1.27e-04 | 4.26e+02 | 4.96e+00 |   0.3 |  10;2 |  10;3 
##    45 | 0.999318 | 1.718127e+00 |   1.04e-04 | 3.30e+02 | 3.84e+00 |   0.2 |  10;3 |  10;3 
##    46 | 0.999318 | 1.717983e+00 |   8.35e-05 | 2.55e+02 | 2.97e+00 |   0.2 |  10;3 |  10;3 
##    47 | 0.999318 | 1.717875e+00 |   6.33e-05 | 3.95e+02 | 4.60e+00 |   0.2 |  10;2 |  10;2 
##    48 | 0.999318 | 1.717792e+00 |   4.79e-05 | 3.06e+02 | 3.56e+00 |   0.2 |  10;3 |  10;3 
##    49 | 0.999319 | 1.717730e+00 |   3.62e-05 | 4.73e+02 | 5.51e+00 |   0.1 |  10;2 |  10;2 
##    50 | 0.999319 | 1.717683e+00 |   2.74e-05 | 3.66e+02 | 4.27e+00 |   0.2 |  10;3 |  10;3 
##    51 | 0.999319 | 1.717647e+00 |   2.09e-05 | 5.67e+02 | 3.30e+00 |   0.2 |  10;3 |  10;2 
##    52 | 0.999319 | 1.717620e+00 |   1.58e-05 | 4.39e+02 | 5.11e+00 |   0.3 |  10;2 |  10;3 
##    53 | 0.999319 | 1.717599e+00 |   1.20e-05 | 3.40e+02 | 3.96e+00 |   0.2 |  10;3 |  10;3 
##    54 | 0.999319 | 1.717584e+00 |   9.14e-06 | 2.63e+02 | 6.12e+00 |   0.2 |  10;2 |  10;3 
##    55 | 0.999319 | 1.717572e+00 |   6.97e-06 | 4.07e+02 | 4.74e+00 |   0.2 |  10;3 |  10;2 
##    56 | 0.999319 | 1.717563e+00 |   5.32e-06 | 3.15e+02 | 7.34e+00 |   0.2 |  10;2 |  10;3 
##    57 | 0.999319 | 1.717556e+00 |   4.03e-06 | 4.88e+02 | 5.68e+00 |   0.2 |  10;3 |  10;2 
##    58 | 0.999319 | 1.717550e+00 |   3.09e-06 | 3.77e+02 | 4.39e+00 |   0.2 |  10;3 |  10;3 
##    59 | 0.999319 | 1.717546e+00 |   2.36e-06 | 5.84e+02 | 6.80e+00 |   0.2 |  10;2 |  10;2 
##    60 | 0.999319 | 1.717543e+00 |   1.81e-06 | 4.52e+02 | 5.27e+00 |   0.2 |  10;3 |  10;3 
##    61 | 0.999319 | 1.717541e+00 |   1.38e-06 | 3.50e+02 | 4.08e+00 |   0.2 |  10;3 |  10;3 
##    62 | 0.999319 | 1.717539e+00 |   1.05e-06 | 5.42e+02 | 6.31e+00 |   0.1 |  10;2 |  10;2 
##    63 | 0.999319 | 1.717538e+00 |   7.86e-07 | 4.19e+02 | 4.88e+00 |   0.3 |  10;3 |  10;3 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   
##  BY =  
##             x        y
## [1,] 5.430757 3.146258
## [2,] 2.043435 2.710947
## [3,] 3.128401 4.781751
## 

What is the result? A list with names:

names(aa)
##  [1] "BY"              "A"               "B"               "SSE"            
##  [5] "varexpl"         "initialsolution" "freqstable"      "iterations"     
##  [9] "time"            "converges"       "nAup"            "nAdown"         
## [13] "nBup"            "nBdown"          "run_results"     "Y"              
## [17] "data_tables"     "call"

More specifically:

  1. BY is the archetypes as matrix, each row is an archetype

  2. A, B matrices, SSE and varexpl as they were defined in (AA)

  3. initialsolution gives the rows that were used as initial points in algorithm

  4. freqstable is a frequency table for all candidate initial points found

  5. iterations are the number of main iterations done by algorithm

  6. time is the seconds elapsed for the entire process

  7. converges is a TRUE/FALSE flag to inform us if convergence criteria were achieved before the maximum iterations (default=2000) reached

  8. nAup, nAdown, nBup, nBdown are for deep inspection of PCHA algorithm

  9. run_results is a list with “iterations” lists each one having next components:

    • SSE
    • varexpl
    • B
    • BY

The archetypes are indeed on the outer boundary, more precisely they form a principal convex hull of data points:

If you observe a little bit more the above Figure, then you’ll see that the inner triangle is approximately similar to the outer one, which is the true solution, although not present inside the data set.

Lets plot the convergence process for SSE and all iterations:

It seems that all the “hard work” has been done during the first 30 iterations.

But we can check the quality of our solution. Look how the final archetypes are precisely being created from data points and what are the relevant used weights for that task:

BB=aa$B
yy=check_Bmatrix(B = BB, chvertices = NULL, verbose = TRUE)
## Archetype 1 is a mixture of only next rows with weights: 
##   
##        34         5 
## 0.5441943 0.4558057 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 2 is a mixture of only next rows with weights: 
##   
##        62        52 
## 0.7808615 0.2191385 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 3 is a mixture of only next rows with weights: 
##   
##         86         91 
## 0.90628955 0.09371045 
##   
## Weights add to: 
## [1] 1
##   
## 
# yy$used_rows
# yy$used_weights

What is exactly the CH of our data set? We can use the “chull” function and find it since it is a 2D data set:

ch=chull(df)
ch
## [1]  5 26 52 62 43 91 86 89 34
df[ch,]
##           x        y
## 5  5.123878 2.745874
## 26 3.806534 2.546629
## 52 2.334330 2.416794
## 62 1.961799 2.793497
## 43 2.031007 3.513241
## 91 2.428351 4.022825
## 86 3.200786 4.860224
## 89 5.653335 3.618327
## 34 5.687791 3.481611

So our used rows in AA indeed belong to CH:

yy$used_rows
## [1] 34  5 62 52 86 91
unlist(yy$used_rows)%in%ch
## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Question Can we further decrease the computation time?

Answer Yes, if we use cleverly the B matrix

Lets plot the final used points:

Watch now that the final archetypes are points “somewhere” on the line segments connecting the final used points, with weights given by function “check_Bmatrix()”. From the relevant weights we observe that archetypes are closer to the first points of every list element, so it is reasonable to try as initial solution the vector of rows c(34,62,86), because they have the greatest weights:

aa2=archetypal(df=df,kappas = 3,initialrows =  c(34,62,86), verbose = TRUE,rseed=9102,save_history = TRUE)
## [1] 34 62 86
## The initial solution that will be used is 
##           x        y
## 34 5.687791 3.481611
## 62 1.961799 2.793497
## 86 3.200786 4.860224
##   
## Time for the 25 initial A updates was 0.22 secs 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   Iter|   VarExpl|          SSE | |dSSE|/SSE |       muB|       muA| t(sec)|Aup;dwn|Bup;dwn 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##     1 | 0.998844 | 2.913625e+00 |   1.01e-01 | 6.19e+00 | 4.61e+00 |   0.2 |  10;2 |  10;0 
##     2 | 0.998937 | 2.678386e+00 |   8.78e-02 | 4.79e+00 | 3.57e+00 |   0.2 |  10;3 |  10;3 
##     3 | 0.999023 | 2.463454e+00 |   8.72e-02 | 1.48e+01 | 5.53e+00 |   0.2 |  10;2 |  10;1 
##     4 | 0.999079 | 2.321105e+00 |   6.13e-02 | 1.15e+01 | 4.28e+00 |   0.2 |  10;3 |  10;3 
##     5 | 0.999179 | 2.068418e+00 |   1.22e-01 | 7.11e+01 | 3.31e+00 |   0.2 |  10;3 |  10;0 
##     6 | 0.999230 | 1.940781e+00 |   6.58e-02 | 5.50e+01 | 5.13e+00 |   0.1 |  10;2 |  10;3 
##     7 | 0.999280 | 1.815789e+00 |   6.88e-02 | 3.41e+02 | 3.97e+00 |   0.3 |  10;3 |  10;0 
##     8 | 0.999299 | 1.768135e+00 |   2.70e-02 | 5.27e+02 | 3.07e+00 |   0.2 |  10;3 |  10;2 
##     9 | 0.999305 | 1.751790e+00 |   9.33e-03 | 4.08e+02 | 4.75e+00 |   0.2 |  10;2 |  10;3 
##    10 | 0.999308 | 1.743209e+00 |   4.92e-03 | 1.58e+02 | 3.68e+00 |   0.3 |  10;3 |  10;4 
##    11 | 0.999311 | 1.737558e+00 |   3.25e-03 | 2.45e+02 | 5.69e+00 |   0.1 |  10;2 |  10;2 
##    12 | 0.999312 | 1.733653e+00 |   2.25e-03 | 1.89e+02 | 4.41e+00 |   0.2 |  10;3 |  10;3 
##    13 | 0.999313 | 1.730675e+00 |   1.72e-03 | 2.93e+02 | 3.41e+00 |   0.1 |  10;3 |  10;2 
##    14 | 0.999314 | 1.728367e+00 |   1.34e-03 | 2.27e+02 | 5.28e+00 |   0.1 |  10;2 |  10;3 
##    15 | 0.999315 | 1.726586e+00 |   1.03e-03 | 1.75e+02 | 4.09e+00 |   0.1 |  10;3 |  10;3 
##    16 | 0.999316 | 1.725153e+00 |   8.31e-04 | 2.72e+02 | 3.16e+00 |   0.1 |  10;3 |  10;2 
##    17 | 0.999316 | 1.724060e+00 |   6.34e-04 | 4.20e+02 | 4.90e+00 |   0.1 |  10;2 |  10;2 
##    18 | 0.999316 | 1.723044e+00 |   5.90e-04 | 3.25e+02 | 3.79e+00 |   0.2 |  10;3 |  10;3 
##    19 | 0.999317 | 1.722139e+00 |   5.26e-04 | 5.04e+02 | 5.87e+00 |   0.2 |  10;2 |  10;2 
##    20 | 0.999317 | 1.721367e+00 |   4.48e-04 | 3.90e+02 | 4.54e+00 |   0.2 |  10;3 |  10;3 
##    21 | 0.999317 | 1.720589e+00 |   4.52e-04 | 6.04e+02 | 3.51e+00 |   0.2 |  10;3 |  10;2 
##    22 | 0.999318 | 1.719942e+00 |   3.76e-04 | 4.67e+02 | 5.44e+00 |   0.1 |  10;2 |  10;3 
##    23 | 0.999318 | 1.719357e+00 |   3.40e-04 | 7.23e+02 | 4.21e+00 |   0.1 |  10;3 |  10;2 
##    24 | 0.999318 | 1.718724e+00 |   3.68e-04 | 5.60e+02 | 6.52e+00 |   0.2 |  10;2 |  10;3 
##    25 | 0.999318 | 1.718419e+00 |   1.78e-04 | 4.33e+02 | 5.04e+00 |   0.2 |  10;3 |  10;3 
##    26 | 0.999318 | 1.718228e+00 |   1.11e-04 | 6.70e+02 | 3.90e+00 |   0.2 |  10;3 |  10;2 
##    27 | 0.999318 | 1.718070e+00 |   9.21e-05 | 5.19e+02 | 6.04e+00 |   0.2 |  10;2 |  10;3 
##    28 | 0.999318 | 1.717939e+00 |   7.59e-05 | 4.02e+02 | 4.68e+00 |   0.2 |  10;3 |  10;3 
##    29 | 0.999318 | 1.717839e+00 |   5.84e-05 | 3.11e+02 | 3.62e+00 |   0.2 |  10;3 |  10;3 
##    30 | 0.999318 | 1.717764e+00 |   4.35e-05 | 4.81e+02 | 5.60e+00 |   0.2 |  10;2 |  10;2 
##    31 | 0.999319 | 1.717708e+00 |   3.28e-05 | 3.72e+02 | 4.34e+00 |   0.2 |  10;3 |  10;3 
##    32 | 0.999319 | 1.717666e+00 |   2.47e-05 | 2.88e+02 | 6.71e+00 |   0.1 |  10;2 |  10;3 
##    33 | 0.999319 | 1.717634e+00 |   1.86e-05 | 4.46e+02 | 5.20e+00 |   0.2 |  10;3 |  10;2 
##    34 | 0.999319 | 1.717609e+00 |   1.41e-05 | 3.45e+02 | 4.02e+00 |   0.2 |  10;3 |  10;3 
##    35 | 0.999319 | 1.717591e+00 |   1.06e-05 | 5.34e+02 | 6.22e+00 |   0.2 |  10;2 |  10;2 
##    36 | 0.999319 | 1.717577e+00 |   8.03e-06 | 4.14e+02 | 4.82e+00 |   0.2 |  10;3 |  10;3 
##    37 | 0.999319 | 1.717567e+00 |   6.08e-06 | 6.40e+02 | 3.73e+00 |   0.1 |  10;3 |  10;2 
##    38 | 0.999319 | 1.717559e+00 |   4.68e-06 | 4.96e+02 | 5.77e+00 |   0.1 |  10;2 |  10;3 
##    39 | 0.999319 | 1.717553e+00 |   3.57e-06 | 3.84e+02 | 4.47e+00 |   0.1 |  10;3 |  10;3 
##    40 | 0.999319 | 1.717548e+00 |   2.70e-06 | 2.97e+02 | 6.91e+00 |   0.1 |  10;2 |  10;3 
##    41 | 0.999319 | 1.717545e+00 |   2.07e-06 | 4.60e+02 | 5.35e+00 |   0.3 |  10;3 |  10;2 
##    42 | 0.999319 | 1.717542e+00 |   1.55e-06 | 7.11e+02 | 4.14e+00 |   0.2 |  10;3 |  10;2 
##    43 | 0.999319 | 1.717540e+00 |   1.20e-06 | 5.51e+02 | 6.41e+00 |   0.1 |  10;2 |  10;3 
##    44 | 0.999319 | 1.717538e+00 |   9.13e-07 | 4.26e+02 | 4.96e+00 |   0.2 |  10;3 |  10;3 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   
##  BY =  
##             x        y
## [1,] 5.430821 3.146342
## [2,] 2.043495 2.710886
## [3,] 3.128494 4.781853
## 
yy2=check_Bmatrix(aa2$B,verbose = TRUE)
## Archetype 1 is a mixture of only next rows with weights: 
##   
##        34         5 
## 0.5443082 0.4556918 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 2 is a mixture of only next rows with weights: 
##   
##     62     52 
## 0.7807 0.2193 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 3 is a mixture of only next rows with weights: 
##   
##         86         91 
## 0.90641072 0.09358928 
##   
## Weights add to: 
## [1] 1
##   
## 

The same solution as before, but with 44 instead of 63 iterations!

Now we’ ll try a 3D example.

library(plot3D)
#
p1=c(3,0,0);p2=c(0,5,0);p3=c(3,5,7);p4=c(0,0,0);
dp=data.frame(rbind(p1,p2,p3,p4));dp=dp[chull(dp),];colnames(dp)=c("x","y","z")
set.seed(9102)
df=data.frame(t(sapply(1:100, function(i,dp){
  cc=runif(4)
  cc=cc/sum(cc)
  colSums(dp*cc)
},dp)))
colnames(df)=c("x","y","z")
scatter3D(x=dp$x,y=dp$y,z=dp$z,colvar=NULL,lwd = 2, d = 3,xlab='x',ylab='y',zlab='z',theta=120,phi=15,
          main = "Generators and Data Points", bty ="g",ticktype = "detailed",col='black',pch=10,cex=2.5)
points3D(x=df$x,y=df$y,z=df$z,col='blue',add=T,pch=19)

The above data frame (without the generating edges) can be loaded as:

# data("wd3")
# df=wd3

(Data frames are equal at least until the 16th decimal point.)

We run “archetypal” function with kappas=4 (due to the construction procedure) and then we also check the final B matrix:

aa3 = archetypal(df = df, kappas = 4, verbose = TRUE, rseed = 9102, save_history = TRUE)
## Time for computing Projected Convex Hull was 0.01 secs
## Next projected convex hull initial solution will be used... 
##            x        y          z
## 61 2.2751335 3.993475 4.67136711
## 82 0.8598104 1.072867 0.06680034
## 64 2.0259680 0.923847 0.02359727
## 67 1.8788059 4.375142 4.34740317
##   
## Time for the 25 initial A updates was 0.16 secs 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   Iter|   VarExpl|          SSE | |dSSE|/SSE |       muB|       muA| t(sec)|Aup;dwn|Bup;dwn 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##     1 | 0.986296 | 1.925236e+01 |   5.65e-01 | 1.55e+00 | 1.15e+00 |   0.2 |  10;2 |  10;2 
##     2 | 0.989115 | 1.529170e+01 |   2.59e-01 | 4.79e+00 | 1.79e+00 |   0.1 |  10;2 |  10;1 
##     3 | 0.990390 | 1.350085e+01 |   1.33e-01 | 7.42e+00 | 6.91e-01 |   0.2 |  10;4 |  10;2 
##     4 | 0.991364 | 1.213181e+01 |   1.13e-01 | 2.87e+00 | 1.07e+00 |   0.2 |  10;2 |  10;4 
##     5 | 0.992137 | 1.104640e+01 |   9.83e-02 | 1.78e+01 | 8.28e-01 |   0.1 |  10;3 |  10;0 
##     6 | 0.992767 | 1.016073e+01 |   8.72e-02 | 2.75e+01 | 1.28e+00 |   0.2 |  10;2 |  10;2 
##     7 | 0.993440 | 9.216011e+00 |   1.03e-01 | 1.06e+01 | 9.92e-01 |   0.2 |  10;3 |  10;4 
##     8 | 0.993924 | 8.535612e+00 |   7.97e-02 | 6.59e+01 | 7.68e-01 |   0.1 |  10;3 |  10;0 
##     9 | 0.994329 | 7.966527e+00 |   7.14e-02 | 5.10e+01 | 1.19e+00 |   0.2 |  10;2 |  10;3 
##    10 | 0.994699 | 7.447174e+00 |   6.97e-02 | 7.90e+01 | 9.20e-01 |   0.1 |  10;3 |  10;2 
##    11 | 0.995019 | 6.997155e+00 |   6.43e-02 | 7.64e+00 | 7.12e-01 |   0.2 |  10;3 |  10;6 
##    12 | 0.995328 | 6.563703e+00 |   6.60e-02 | 4.73e+01 | 1.10e+00 |   0.1 |  10;2 |  10;0 
##    13 | 0.995604 | 6.175204e+00 |   6.29e-02 | 7.32e+01 | 1.71e+00 |   0.1 |  10;2 |  10;2 
##    14 | 0.995841 | 5.842323e+00 |   5.70e-02 | 2.83e+01 | 1.32e+00 |   0.2 |  10;3 |  10;4 
##    15 | 0.996036 | 5.568624e+00 |   4.92e-02 | 4.39e+01 | 1.02e+00 |   0.2 |  10;3 |  10;2 
##    16 | 0.996126 | 5.441725e+00 |   2.33e-02 | 1.36e+02 | 7.91e-01 |   0.2 |  10;3 |  10;1 
##    17 | 0.996213 | 5.320133e+00 |   2.29e-02 | 1.05e+02 | 1.22e+00 |   0.2 |  10;2 |  10;3 
##    18 | 0.996300 | 5.197329e+00 |   2.36e-02 | 8.14e+01 | 9.47e-01 |   0.3 |  10;3 |  10;3 
##    19 | 0.996387 | 5.075748e+00 |   2.40e-02 | 1.26e+02 | 7.33e-01 |   0.2 |  10;3 |  10;2 
##    20 | 0.996489 | 4.932586e+00 |   2.90e-02 | 9.75e+01 | 1.14e+00 |   0.2 |  10;2 |  10;3 
##    21 | 0.996596 | 4.781988e+00 |   3.15e-02 | 1.51e+02 | 8.78e-01 |   0.2 |  10;3 |  10;2 
##    22 | 0.996688 | 4.652785e+00 |   2.78e-02 | 1.17e+02 | 1.36e+00 |   0.2 |  10;2 |  10;3 
##    23 | 0.996764 | 4.546152e+00 |   2.35e-02 | 9.04e+01 | 5.26e-01 |   0.3 |  10;4 |  10;3 
##    24 | 0.996831 | 4.451938e+00 |   2.12e-02 | 7.00e+01 | 8.15e-01 |   0.2 |  10;2 |  10;3 
##    25 | 0.996890 | 4.369416e+00 |   1.89e-02 | 5.41e+01 | 1.26e+00 |   0.2 |  10;2 |  10;3 
##    26 | 0.996942 | 4.296367e+00 |   1.70e-02 | 4.19e+01 | 9.76e-01 |   0.2 |  10;3 |  10;3 
##    27 | 0.996988 | 4.231817e+00 |   1.53e-02 | 6.49e+01 | 7.55e-01 |   0.2 |  10;3 |  10;2 
##    28 | 0.997031 | 4.170377e+00 |   1.47e-02 | 5.02e+01 | 1.17e+00 |   0.2 |  10;2 |  10;3 
##    29 | 0.997068 | 4.118687e+00 |   1.26e-02 | 3.89e+01 | 9.05e-01 |   0.3 |  10;3 |  10;3 
##    30 | 0.997102 | 4.070661e+00 |   1.18e-02 | 6.01e+01 | 7.00e-01 |   0.3 |  10;3 |  10;2 
##    31 | 0.997135 | 4.024242e+00 |   1.15e-02 | 4.65e+01 | 1.08e+00 |   0.2 |  10;2 |  10;3 
##    32 | 0.997164 | 3.983658e+00 |   1.02e-02 | 3.60e+01 | 8.39e-01 |   0.3 |  10;3 |  10;3 
##    33 | 0.997193 | 3.943674e+00 |   1.01e-02 | 1.12e+02 | 1.30e+00 |   0.2 |  10;2 |  10;1 
##    34 | 0.997220 | 3.905030e+00 |   9.90e-03 | 2.16e+01 | 1.01e+00 |   0.2 |  10;3 |  10;5 
##    35 | 0.997242 | 3.873923e+00 |   8.03e-03 | 1.34e+02 | 7.78e-01 |   0.2 |  10;3 |  10;0 
##    36 | 0.997259 | 3.850079e+00 |   6.19e-03 | 2.59e+01 | 1.20e+00 |   0.3 |  10;2 |  10;5 
##    37 | 0.997276 | 3.827417e+00 |   5.92e-03 | 1.60e+02 | 9.32e-01 |   0.2 |  10;3 |  10;0 
##    38 | 0.997290 | 3.807374e+00 |   5.26e-03 | 1.24e+02 | 1.44e+00 |   0.3 |  10;2 |  10;3 
##    39 | 0.997298 | 3.795453e+00 |   3.14e-03 | 1.92e+02 | 1.12e+00 |   0.2 |  10;3 |  10;2 
##    40 | 0.997306 | 3.785347e+00 |   2.67e-03 | 2.97e+02 | 8.64e-01 |   0.2 |  10;3 |  10;2 
##    41 | 0.997312 | 3.776421e+00 |   2.36e-03 | 5.74e+01 | 1.34e+00 |   0.2 |  10;2 |  10;5 
##    42 | 0.997318 | 3.767308e+00 |   2.42e-03 | 3.56e+02 | 1.04e+00 |   0.2 |  10;3 |  10;0 
##    43 | 0.997324 | 3.759626e+00 |   2.04e-03 | 1.38e+02 | 8.01e-01 |   0.2 |  10;3 |  10;4 
##    44 | 0.997328 | 3.753137e+00 |   1.73e-03 | 2.13e+02 | 1.24e+00 |   0.3 |  10;2 |  10;2 
##    45 | 0.997333 | 3.747007e+00 |   1.64e-03 | 1.65e+02 | 9.60e-01 |   0.2 |  10;3 |  10;3 
##    46 | 0.997337 | 3.741376e+00 |   1.51e-03 | 1.28e+02 | 7.43e-01 |   0.2 |  10;3 |  10;3 
##    47 | 0.997340 | 3.736529e+00 |   1.30e-03 | 1.98e+02 | 1.15e+00 |   0.2 |  10;2 |  10;2 
##    48 | 0.997343 | 3.732100e+00 |   1.19e-03 | 1.53e+02 | 8.90e-01 |   0.2 |  10;3 |  10;3 
##    49 | 0.997346 | 3.728154e+00 |   1.06e-03 | 1.18e+02 | 1.38e+00 |   0.2 |  10;2 |  10;3 
##    50 | 0.997349 | 3.724836e+00 |   8.91e-04 | 1.83e+02 | 5.33e-01 |   0.1 |  10;4 |  10;2 
##    51 | 0.997351 | 3.721744e+00 |   8.31e-04 | 1.42e+02 | 8.26e-01 |   0.2 |  10;2 |  10;3 
##    52 | 0.997353 | 3.719002e+00 |   7.37e-04 | 2.19e+02 | 1.28e+00 |   0.1 |  10;2 |  10;2 
##    53 | 0.997355 | 3.716469e+00 |   6.82e-04 | 1.70e+02 | 9.89e-01 |   0.3 |  10;3 |  10;3 
##    54 | 0.997356 | 3.714216e+00 |   6.06e-04 | 2.63e+02 | 1.53e+00 |   0.2 |  10;2 |  10;2 
##    55 | 0.997358 | 3.712217e+00 |   5.38e-04 | 2.04e+02 | 1.18e+00 |   0.2 |  10;3 |  10;3 
##    56 | 0.997359 | 3.710408e+00 |   4.88e-04 | 1.58e+02 | 1.83e+00 |   0.2 |  10;2 |  10;3 
##    57 | 0.997360 | 3.708806e+00 |   4.32e-04 | 1.22e+02 | 7.10e-01 |   0.2 |  10;4 |  10;3 
##    58 | 0.997361 | 3.707389e+00 |   3.82e-04 | 1.89e+02 | 1.10e+00 |   0.2 |  10;2 |  10;2 
##    59 | 0.997362 | 3.706106e+00 |   3.46e-04 | 1.46e+02 | 8.50e-01 |   0.2 |  10;3 |  10;3 
##    60 | 0.997363 | 3.705032e+00 |   2.90e-04 | 2.26e+02 | 1.32e+00 |   0.2 |  10;2 |  10;2 
##    61 | 0.997363 | 3.704008e+00 |   2.77e-04 | 3.50e+02 | 1.02e+00 |   0.3 |  10;3 |  10;2 
##    62 | 0.997364 | 3.703117e+00 |   2.40e-04 | 2.71e+02 | 7.89e-01 |   0.2 |  10;3 |  10;3 
##    63 | 0.997365 | 3.702355e+00 |   2.06e-04 | 2.10e+02 | 1.22e+00 |   0.2 |  10;2 |  10;3 
##    64 | 0.997365 | 3.701661e+00 |   1.87e-04 | 3.24e+02 | 9.45e-01 |   0.2 |  10;3 |  10;2 
##    65 | 0.997366 | 3.701031e+00 |   1.70e-04 | 2.51e+02 | 7.31e-01 |   0.2 |  10;3 |  10;3 
##    66 | 0.997366 | 3.700483e+00 |   1.48e-04 | 1.94e+02 | 1.13e+00 |   0.2 |  10;2 |  10;3 
##    67 | 0.997366 | 3.699995e+00 |   1.32e-04 | 3.01e+02 | 1.75e+00 |   0.3 |  10;2 |  10;2 
##    68 | 0.997367 | 3.699538e+00 |   1.23e-04 | 1.16e+02 | 6.78e-01 |   0.3 |  10;4 |  10;4 
##    69 | 0.997367 | 3.699150e+00 |   1.05e-04 | 1.80e+02 | 1.05e+00 |   0.2 |  10;2 |  10;2 
##    70 | 0.997367 | 3.698816e+00 |   9.03e-05 | 1.39e+02 | 1.62e+00 |   0.2 |  10;2 |  10;3 
##    71 | 0.997367 | 3.698521e+00 |   7.98e-05 | 2.16e+02 | 1.26e+00 |   0.1 |  10;3 |  10;2 
##    72 | 0.997367 | 3.698260e+00 |   7.04e-05 | 3.34e+02 | 9.73e-01 |   0.2 |  10;3 |  10;2 
##    73 | 0.997368 | 3.698031e+00 |   6.21e-05 | 1.29e+02 | 1.51e+00 |   0.2 |  10;2 |  10;4 
##    74 | 0.997368 | 3.697827e+00 |   5.50e-05 | 2.00e+02 | 1.17e+00 |   0.3 |  10;3 |  10;2 
##    75 | 0.997368 | 3.697647e+00 |   4.88e-05 | 1.55e+02 | 1.80e+00 |   0.2 |  10;2 |  10;3 
##    76 | 0.997368 | 3.697487e+00 |   4.31e-05 | 2.40e+02 | 6.98e-01 |   0.2 |  10;4 |  10;2 
##    77 | 0.997368 | 3.697347e+00 |   3.81e-05 | 1.86e+02 | 1.08e+00 |   0.2 |  10;2 |  10;3 
##    78 | 0.997368 | 3.697222e+00 |   3.37e-05 | 2.87e+02 | 8.37e-01 |   0.3 |  10;3 |  10;2 
##    79 | 0.997368 | 3.697112e+00 |   2.97e-05 | 2.22e+02 | 1.30e+00 |   0.2 |  10;2 |  10;3 
##    80 | 0.997368 | 3.697015e+00 |   2.63e-05 | 1.72e+02 | 1.00e+00 |   0.2 |  10;3 |  10;3 
##    81 | 0.997368 | 3.696929e+00 |   2.33e-05 | 1.33e+02 | 7.76e-01 |   0.2 |  10;3 |  10;3 
##    82 | 0.997368 | 3.696853e+00 |   2.05e-05 | 2.06e+02 | 1.20e+00 |   0.2 |  10;2 |  10;2 
##    83 | 0.997369 | 3.696786e+00 |   1.82e-05 | 1.60e+02 | 9.29e-01 |   0.3 |  10;3 |  10;3 
##    84 | 0.997369 | 3.696726e+00 |   1.61e-05 | 2.47e+02 | 7.19e-01 |   0.2 |  10;3 |  10;2 
##    85 | 0.997369 | 3.696673e+00 |   1.42e-05 | 1.91e+02 | 1.11e+00 |   0.2 |  10;2 |  10;3 
##    86 | 0.997369 | 3.696627e+00 |   1.26e-05 | 2.96e+02 | 8.62e-01 |   0.1 |  10;3 |  10;2 
##    87 | 0.997369 | 3.696586e+00 |   1.11e-05 | 2.29e+02 | 1.33e+00 |   0.2 |  10;2 |  10;3 
##    88 | 0.997369 | 3.696549e+00 |   9.88e-06 | 1.77e+02 | 5.16e-01 |   0.1 |  10;4 |  10;3 
##    89 | 0.997369 | 3.696517e+00 |   8.75e-06 | 2.74e+02 | 7.99e-01 |   0.2 |  10;2 |  10;2 
##    90 | 0.997369 | 3.696489e+00 |   7.66e-06 | 2.12e+02 | 6.18e-01 |   0.2 |  10;3 |  10;3 
##    91 | 0.997369 | 3.696463e+00 |   6.78e-06 | 1.64e+02 | 9.57e-01 |   0.2 |  10;2 |  10;3 
##    92 | 0.997369 | 3.696441e+00 |   6.00e-06 | 2.55e+02 | 7.41e-01 |   0.4 |  10;3 |  10;2 
##    93 | 0.997369 | 3.696422e+00 |   5.32e-06 | 1.97e+02 | 1.15e+00 |   0.3 |  10;2 |  10;3 
##    94 | 0.997369 | 3.696404e+00 |   4.68e-06 | 3.05e+02 | 8.88e-01 |   0.2 |  10;3 |  10;2 
##    95 | 0.997369 | 3.696389e+00 |   4.15e-06 | 2.36e+02 | 1.37e+00 |   0.2 |  10;2 |  10;3 
##    96 | 0.997369 | 3.696376e+00 |   3.65e-06 | 1.83e+02 | 1.06e+00 |   0.2 |  10;3 |  10;3 
##    97 | 0.997369 | 3.696364e+00 |   3.23e-06 | 2.83e+02 | 8.23e-01 |   0.2 |  10;3 |  10;2 
##    98 | 0.997369 | 3.696353e+00 |   2.82e-06 | 2.19e+02 | 1.27e+00 |   0.2 |  10;2 |  10;3 
##    99 | 0.997369 | 3.696344e+00 |   2.54e-06 | 3.39e+02 | 9.86e-01 |   0.2 |  10;3 |  10;2 
##   100 | 0.997369 | 3.696336e+00 |   2.23e-06 | 2.62e+02 | 7.63e-01 |   0.2 |  10;3 |  10;3 
##   101 | 0.997369 | 3.696328e+00 |   1.96e-06 | 2.03e+02 | 1.18e+00 |   0.1 |  10;2 |  10;3 
##   102 | 0.997369 | 3.696322e+00 |   1.73e-06 | 3.14e+02 | 9.14e-01 |   0.1 |  10;3 |  10;2 
##   103 | 0.997369 | 3.696316e+00 |   1.56e-06 | 2.43e+02 | 1.42e+00 |   0.1 |  10;2 |  10;3 
##   104 | 0.997369 | 3.696311e+00 |   1.36e-06 | 1.88e+02 | 1.10e+00 |   0.1 |  10;3 |  10;3 
##   105 | 0.997369 | 3.696307e+00 |   1.19e-06 | 2.91e+02 | 8.48e-01 |   0.2 |  10;3 |  10;2 
##   106 | 0.997369 | 3.696303e+00 |   1.07e-06 | 2.25e+02 | 1.31e+00 |   0.2 |  10;2 |  10;3 
##   107 | 0.997369 | 3.696299e+00 |   9.40e-07 | 3.49e+02 | 1.02e+00 |   0.2 |  10;3 |  10;2 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   
##  BY =  
##              x         y         z
## [1,] 2.2751335 3.9934754 4.6713671
## [2,] 0.7122153 0.8875855 0.2730237
## [3,] 0.7056238 3.9337320 1.0005031
## [4,] 1.9953569 0.9697649 0.2951150
## 
yy3 = check_Bmatrix(aa3$B)
## Archetype 1 is a mixture of only next rows with weights: 
##   
## 61 
##  1 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 2 is a mixture of only next rows with weights: 
##   
## 56 
##  1 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 3 is a mixture of only next rows with weights: 
##   
##        84         9 
## 0.5076314 0.4923686 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 4 is a mixture of only next rows with weights: 
##   
##        64        90 
## 0.8289871 0.1710129 
##   
## Weights add to: 
## [1] 1
##   
## 

Well done, but can we work with less iterations? Lets choose the greatest weighted points from the finally used and then run “archetypal” with those “initialrows”:

irows=yy3$leading_rows
aa4 = archetypal(df = df, kappas = 4, initialrows = irows, verbose = TRUE, rseed = 9102, save_history = TRUE)
## [1] 61 56 84 64
## The initial solution that will be used is 
##            x         y          z
## 61 2.2751335 3.9934754 4.67136711
## 56 0.7122153 0.8875855 0.27302372
## 84 0.9195260 4.4026407 1.52208421
## 64 2.0259680 0.9238470 0.02359727
##   
## Time for the 25 initial A updates was 0.27 secs 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   Iter|   VarExpl|          SSE | |dSSE|/SSE |       muB|       muA| t(sec)|Aup;dwn|Bup;dwn 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##     1 | 0.996837 | 4.442885e+00 |   1.50e-01 | 6.19e+00 | 1.15e+00 |   0.1 |  10;2 |  10;0 
##     2 | 0.997042 | 4.156170e+00 |   6.90e-02 | 3.83e+01 | 8.93e-01 |   0.1 |  10;3 |  10;0 
##     3 | 0.997149 | 4.005012e+00 |   3.77e-02 | 2.37e+02 | 1.38e+00 |   0.1 |  10;2 |  10;0 
##     4 | 0.997215 | 3.912435e+00 |   2.37e-02 | 9.19e+01 | 1.07e+00 |   0.2 |  10;3 |  10;4 
##     5 | 0.997250 | 3.862796e+00 |   1.29e-02 | 1.42e+02 | 8.28e-01 |   0.2 |  10;3 |  10;2 
##     6 | 0.997273 | 3.830356e+00 |   8.47e-03 | 2.20e+02 | 6.41e-01 |   0.2 |  10;3 |  10;2 
##     7 | 0.997290 | 3.807113e+00 |   6.11e-03 | 1.70e+02 | 9.92e-01 |   0.2 |  10;2 |  10;3 
##     8 | 0.997303 | 3.789357e+00 |   4.69e-03 | 2.64e+02 | 7.68e-01 |   0.2 |  10;3 |  10;2 
##     9 | 0.997313 | 3.775430e+00 |   3.69e-03 | 2.04e+02 | 5.94e-01 |   0.2 |  10;3 |  10;3 
##    10 | 0.997321 | 3.764147e+00 |   3.00e-03 | 1.58e+02 | 9.20e-01 |   0.2 |  10;2 |  10;3 
##    11 | 0.997327 | 3.754830e+00 |   2.48e-03 | 1.22e+02 | 1.42e+00 |   0.2 |  10;2 |  10;3 
##    12 | 0.997333 | 3.747108e+00 |   2.06e-03 | 1.89e+02 | 1.10e+00 |   0.3 |  10;3 |  10;2 
##    13 | 0.997337 | 3.740522e+00 |   1.76e-03 | 1.46e+02 | 8.53e-01 |   0.2 |  10;3 |  10;3 
##    14 | 0.997341 | 3.734977e+00 |   1.48e-03 | 2.27e+02 | 1.32e+00 |   0.1 |  10;2 |  10;2 
##    15 | 0.997345 | 3.730177e+00 |   1.29e-03 | 1.75e+02 | 1.02e+00 |   0.1 |  10;3 |  10;3 
##    16 | 0.997348 | 3.726058e+00 |   1.11e-03 | 2.72e+02 | 7.91e-01 |   0.2 |  10;3 |  10;2 
##    17 | 0.997350 | 3.722514e+00 |   9.52e-04 | 2.10e+02 | 1.22e+00 |   0.2 |  10;2 |  10;3 
##    18 | 0.997352 | 3.719410e+00 |   8.35e-04 | 1.63e+02 | 9.47e-01 |   0.2 |  10;3 |  10;3 
##    19 | 0.997354 | 3.716711e+00 |   7.26e-04 | 1.26e+02 | 7.33e-01 |   0.2 |  10;3 |  10;3 
##    20 | 0.997356 | 3.714355e+00 |   6.34e-04 | 1.95e+02 | 1.14e+00 |   0.2 |  10;2 |  10;2 
##    21 | 0.997358 | 3.712269e+00 |   5.62e-04 | 1.51e+02 | 8.78e-01 |   0.3 |  10;3 |  10;3 
##    22 | 0.997359 | 3.710432e+00 |   4.95e-04 | 2.34e+02 | 1.36e+00 |   0.2 |  10;2 |  10;2 
##    23 | 0.997360 | 3.708810e+00 |   4.37e-04 | 1.81e+02 | 1.05e+00 |   0.2 |  10;3 |  10;3 
##    24 | 0.997361 | 3.707381e+00 |   3.85e-04 | 2.80e+02 | 8.15e-01 |   0.1 |  10;3 |  10;2 
##    25 | 0.997362 | 3.706116e+00 |   3.41e-04 | 2.17e+02 | 1.26e+00 |   0.2 |  10;2 |  10;3 
##    26 | 0.997363 | 3.704998e+00 |   3.02e-04 | 1.68e+02 | 9.76e-01 |   0.2 |  10;3 |  10;3 
##    27 | 0.997363 | 3.704003e+00 |   2.69e-04 | 1.30e+02 | 7.55e-01 |   0.2 |  10;3 |  10;3 
##    28 | 0.997364 | 3.703124e+00 |   2.37e-04 | 2.01e+02 | 1.17e+00 |   0.1 |  10;2 |  10;2 
##    29 | 0.997365 | 3.702345e+00 |   2.10e-04 | 1.55e+02 | 9.05e-01 |   0.2 |  10;3 |  10;3 
##    30 | 0.997365 | 3.701654e+00 |   1.87e-04 | 2.41e+02 | 1.40e+00 |   0.1 |  10;2 |  10;2 
##    31 | 0.997366 | 3.701039e+00 |   1.66e-04 | 1.86e+02 | 1.08e+00 |   0.1 |  10;3 |  10;3 
##    32 | 0.997366 | 3.700495e+00 |   1.47e-04 | 1.44e+02 | 8.39e-01 |   0.2 |  10;3 |  10;3 
##    33 | 0.997366 | 3.700014e+00 |   1.30e-04 | 2.23e+02 | 1.30e+00 |   0.1 |  10;2 |  10;2 
##    34 | 0.997367 | 3.699586e+00 |   1.16e-04 | 1.73e+02 | 1.01e+00 |   0.2 |  10;3 |  10;3 
##    35 | 0.997367 | 3.699207e+00 |   1.03e-04 | 1.34e+02 | 7.78e-01 |   0.2 |  10;3 |  10;3 
##    36 | 0.997367 | 3.698872e+00 |   9.04e-05 | 2.07e+02 | 1.20e+00 |   0.2 |  10;2 |  10;2 
##    37 | 0.997367 | 3.698574e+00 |   8.08e-05 | 1.60e+02 | 9.32e-01 |   0.2 |  10;3 |  10;3 
##    38 | 0.997367 | 3.698310e+00 |   7.13e-05 | 2.48e+02 | 7.21e-01 |   0.2 |  10;3 |  10;2 
##    39 | 0.997368 | 3.698077e+00 |   6.31e-05 | 1.92e+02 | 1.12e+00 |   0.2 |  10;2 |  10;3 
##    40 | 0.997368 | 3.697869e+00 |   5.60e-05 | 2.97e+02 | 8.64e-01 |   0.3 |  10;3 |  10;2 
##    41 | 0.997368 | 3.697686e+00 |   4.97e-05 | 2.30e+02 | 1.34e+00 |   0.2 |  10;2 |  10;3 
##    42 | 0.997368 | 3.697523e+00 |   4.40e-05 | 1.78e+02 | 1.04e+00 |   0.2 |  10;3 |  10;3 
##    43 | 0.997368 | 3.697378e+00 |   3.90e-05 | 2.75e+02 | 8.01e-01 |   0.2 |  10;3 |  10;2 
##    44 | 0.997368 | 3.697251e+00 |   3.45e-05 | 2.13e+02 | 6.20e-01 |   0.2 |  10;3 |  10;3 
##    45 | 0.997368 | 3.697138e+00 |   3.05e-05 | 1.65e+02 | 9.60e-01 |   0.2 |  10;2 |  10;3 
##    46 | 0.997368 | 3.697038e+00 |   2.70e-05 | 2.55e+02 | 7.43e-01 |   0.2 |  10;3 |  10;2 
##    47 | 0.997368 | 3.696949e+00 |   2.40e-05 | 1.98e+02 | 5.75e-01 |   0.2 |  10;3 |  10;3 
##    48 | 0.997368 | 3.696871e+00 |   2.12e-05 | 1.53e+02 | 8.90e-01 |   0.1 |  10;2 |  10;3 
##    49 | 0.997369 | 3.696802e+00 |   1.88e-05 | 2.37e+02 | 1.38e+00 |   0.2 |  10;2 |  10;2 
##    50 | 0.997369 | 3.696740e+00 |   1.66e-05 | 1.83e+02 | 1.07e+00 |   0.2 |  10;3 |  10;3 
##    51 | 0.997369 | 3.696686e+00 |   1.47e-05 | 1.42e+02 | 8.26e-01 |   0.2 |  10;3 |  10;3 
##    52 | 0.997369 | 3.696638e+00 |   1.29e-05 | 2.19e+02 | 1.28e+00 |   0.2 |  10;2 |  10;2 
##    53 | 0.997369 | 3.696596e+00 |   1.14e-05 | 3.40e+02 | 9.89e-01 |   0.1 |  10;3 |  10;2 
##    54 | 0.997369 | 3.696558e+00 |   1.02e-05 | 1.31e+02 | 7.65e-01 |   0.2 |  10;3 |  10;4 
##    55 | 0.997369 | 3.696525e+00 |   8.94e-06 | 2.04e+02 | 1.18e+00 |   0.1 |  10;2 |  10;2 
##    56 | 0.997369 | 3.696496e+00 |   7.94e-06 | 1.58e+02 | 9.17e-01 |   0.1 |  10;3 |  10;3 
##    57 | 0.997369 | 3.696470e+00 |   7.05e-06 | 2.44e+02 | 7.10e-01 |   0.2 |  10;3 |  10;2 
##    58 | 0.997369 | 3.696447e+00 |   6.20e-06 | 1.89e+02 | 1.10e+00 |   0.2 |  10;2 |  10;3 
##    59 | 0.997369 | 3.696427e+00 |   5.50e-06 | 2.92e+02 | 8.50e-01 |   0.1 |  10;3 |  10;2 
##    60 | 0.997369 | 3.696409e+00 |   4.87e-06 | 2.26e+02 | 1.32e+00 |   0.1 |  10;2 |  10;3 
##    61 | 0.997369 | 3.696393e+00 |   4.27e-06 | 1.75e+02 | 1.02e+00 |   0.2 |  10;3 |  10;3 
##    62 | 0.997369 | 3.696379e+00 |   3.79e-06 | 1.35e+02 | 1.58e+00 |   0.2 |  10;2 |  10;3 
##    63 | 0.997369 | 3.696367e+00 |   3.31e-06 | 2.10e+02 | 1.22e+00 |   0.3 |  10;3 |  10;2 
##    64 | 0.997369 | 3.696356e+00 |   2.97e-06 | 1.62e+02 | 9.45e-01 |   0.2 |  10;3 |  10;3 
##    65 | 0.997369 | 3.696346e+00 |   2.63e-06 | 2.51e+02 | 7.31e-01 |   0.2 |  10;3 |  10;2 
##    66 | 0.997369 | 3.696337e+00 |   2.32e-06 | 1.94e+02 | 1.13e+00 |   0.2 |  10;2 |  10;3 
##    67 | 0.997369 | 3.696330e+00 |   2.04e-06 | 3.01e+02 | 8.76e-01 |   0.2 |  10;3 |  10;2 
##    68 | 0.997369 | 3.696323e+00 |   1.81e-06 | 2.33e+02 | 1.36e+00 |   0.3 |  10;2 |  10;3 
##    69 | 0.997369 | 3.696317e+00 |   1.58e-06 | 1.80e+02 | 1.05e+00 |   0.2 |  10;3 |  10;3 
##    70 | 0.997369 | 3.696312e+00 |   1.41e-06 | 2.79e+02 | 8.12e-01 |   0.2 |  10;3 |  10;2 
##    71 | 0.997369 | 3.696307e+00 |   1.24e-06 | 2.16e+02 | 1.26e+00 |   0.1 |  10;2 |  10;3 
##    72 | 0.997369 | 3.696303e+00 |   1.10e-06 | 1.67e+02 | 9.73e-01 |   0.1 |  10;3 |  10;3 
##    73 | 0.997369 | 3.696300e+00 |   9.71e-07 | 2.59e+02 | 1.51e+00 |   0.2 |  10;2 |  10;2 
## |-----|----------|--------------|------------|----------|----------|-------|-------|-------| 
##   
##  BY =  
##              x         y         z
## [1,] 2.2751335 3.9934754 4.6713671
## [2,] 0.7122153 0.8875855 0.2730237
## [3,] 0.7056483 3.9337857 1.0005628
## [4,] 1.9953587 0.9697620 0.2950984
## 
yy4 = check_Bmatrix(aa4$B)
## Archetype 1 is a mixture of only next rows with weights: 
##   
## 61 
##  1 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 2 is a mixture of only next rows with weights: 
##   
## 56 
##  1 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 3 is a mixture of only next rows with weights: 
##   
##        84         9 
## 0.5076878 0.4923122 
##   
## Weights add to: 
## [1] 1
##   
##   
## Archetype 4 is a mixture of only next rows with weights: 
##   
##        64        90 
## 0.8289975 0.1710025 
##   
## Weights add to: 
## [1] 1
##   
## 

Yes, we obtained a reduction of \(33\%\) in iterations with exactly the same results and SSE, varexpl.

Now it is time to plot all the above results, like the 2D demo case.

There exist two archetypes as exact CH vertices, one lies closer to a CH vertex and another one is at the middle of the line segment connecting two CH other vertices.

Additionally we observe that archetypes are close to the “invisible” generators of the data set (circled cross, not included in data frame).

Searching for the most efficient initial points

Extensive work on AA has shown next strong results:

  1. if initial solution is far from convex hull (CH) vertices, then probably AA will stuck
  2. not all CH vertices are suitable as candidate initial points, but only the outmost of them
  3. if we choose the finally used points with greatest weights, then computation is faster

In order to take into account the above empirical results, we have developed five functions and now we run them and check if their outputs are close to CH vertices for the 3D data frame we studied just before. Of course we need the CH vertices of new 3D data frame. Now we are going to use “convhulln” from package “geometry” for computing the CH vertices:

ch=unique(do.call(c,as.list(geometry::convhulln(df,'Fx'))))
ch
##  [1] 90 56 65 67 24  4  9 61 96 64 89 82 23 84 29 10 33

Projected Convex Hull

This is a method that can be used for all type of data frames, despite the number of variables \(d\). That is the reason for being the default option in “method” used for finding initial solution.

yy1 = find_outmost_projected_convexhull_points(df, kappas = 4)
yy1$outmost
## [1] 61 82 64 67
yy1$outmostall
## [1] 61 82 64 67
yy1$outmostall%in%ch
## [1] TRUE TRUE TRUE TRUE

Convex Hull

This method actually can be used for data frames with \(d\leq\,6\) variables, see http://www.qhull.org But if it can be used, it gives the best results due to the PCHA theory, see [1] for details.

yy2 = find_outmost_convexhull_points(df, kappas = 4)
yy2$outmost
## [1] 61 64 82 67
yy2$outmostall
## [1] 61 64 82 67
yy2$outmostall%in%ch
## [1] TRUE TRUE TRUE TRUE

Partitioned Convex Hull

This method is an approximation of CH when number of variables are \(d>6\). It creates partitions of mutually exclusive points, then it computes CH for each set and makes the union of vertices. Finally it computes the CH of that union, if it is feasible. (We avoid running because it uses parallel computing. Please check your machine and then run it.)

# yy3 = find_outmost_partitioned_convexhull_points(df, kappas = 4, nworkers = 10)
# yy3$outmost
# yy3$outmostall
# yy3$outmostall%in%ch
# 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 .   
# Time difference of 2.769091 secs
# [1] 84  3
# [1] 61 64 82 67
# [1] 61 64 82 67
# [1] TRUE TRUE TRUE TRUE

Furthest Sum

This is the default method used in PCHA. Here we just apply it many times and then find the unique points. The most frequent ones are used as initial solution. (We don’t run for same reasons as Partitioned Convex Hull).

# yy4 = find_furthestsum_points(df, kappas = 4, nfurthest = 100, nworkers = 10, sortrows = TRUE)
# yy4$outmost
# yy4$outmostall
# yy4$outmostall%in%ch
# [1] 56 61 64 67
# [1] 56 61 64 67
# [1] TRUE TRUE TRUE TRUE

Outmost

This method is the most naive one, but also the most simple. Keep in mind that for a data frame with dimensions \(n\times\,d\) you’ll need \(\frac{8\,n^2}{2^{30}}=\frac{n^2}{2^{27}}\) GB RAM for numeric entries, so use it with caution!

yy5 = find_outmost_points(df, kappas = 4)
yy5$outmost
## [1] 61 64 82 56
yy5$outmostall
## [1] 61 64 82 56 67
yy5$outmostall%in%ch
## [1] TRUE TRUE TRUE TRUE TRUE

From our results we observe that Projected and Partitioned Convex Hull methods gave the same results as the Convex Hull.

This is extremely useful because, as we can see from trials or if we just read http://www.qhull.org , computing CH is not a feasible process when number of variables increases. Actually only if \(d\leq\,6\) it is meaningful to directly compute it.

Please send your comments, suggestions or bugs found to or

References

[1] M Morup and LK Hansen, “Archetypal analysis for machine learning and data mining”, Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.

[2] Source: http://www.mortenmorup.dk/MMhomepageUpdated_files/Page327.htm , last accessed 2022-07-04