For a better understanding of `MAKL`

library, we build a simple example in this document. We first create a synthetic dataset that consists of 1000 rows and 6 features, using standard Gaussian distribution.

```
library(MAKL)
set.seed(64327) #midas
<- matrix(rnorm(6000, 0, 1), nrow = 1000)
df colnames(df) <- c("F1", "F2", "F3", "F4", "F5", "F6")
```

As to `membership`

argument of `makl_train()`

, we prepare a list consisting of two groups such that the first one contains the features F1, F5 and F6; the second one contains the rest. Note that the column names of the input dataset should be a superset of the union of all feature names in the `groups`

list.

```
# check colnames(df) for them to be matching with group members
<- list()
groups 1]] <- c("F1", "F5", "F6")
groups[[2]] <- c("F2", "F3", "F4") groups[[
```

We then create the response vector `y`

such that it will be dependent on the second, the third and the fourth features, namely F2, F3 and F4: If, for a data instance, the sum of entries in the second, the third and the fourth columns is positive, the corresponding response is assigned +1, else, it is assigned -1.

```
<- c()
y for(i in 1:nrow(df)) {
if((df[i, 2] + df[i, 3] + df[i, 4]) > 0) {
<- +1
y[i] else {
} <- -1
y[i]
} }
```

We use the synthetic dataset `df`

and response vector `y`

as our train dataset and train response vector in `makl_train()`

, we choose the number of random features `D`

equal to 2 which makes sense knowing that our train dataset is 6 dimensional. We choose the number of rows to be used for distance matrix calculation, `sigma_N`

equal to 1000, and `lambda_set`

consisting of 0.9, 0.8, 0.7, 0.6 for sparse solutions. As membership list, we use the `groups`

list that we created above.

```
<- makl_train(X = df, y = y, D = 2, sigma_N = 1000, CV = 1, membership = groups, lambda_set = c(0.9, 0.8, 0.7, 0.6))
makl_model #> Lambda: 155.0901 nr.var: 5
#> Lambda: 137.8579 nr.var: 5
#> Lambda: 120.6257 nr.var: 5
#> Lambda: 103.3934 nr.var: 5
```

When we check the coefficients of our model, we see that the chosen kernel for prediction by `makl_train()`

was the kernel of the second group. This was an expected result since we created the response vector `y`

to be dependent on the second group members of the `groups`

list.

```
$model$coefficients
makl_model#> 155.090126229481 137.857889981761 120.625653734041 103.39341748632
#> [1,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [2,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [3,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [4,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [5,] -0.29314353 -0.5938544 -0.9106226 -1.2539243
#> [6,] 0.06703617 0.1352210 0.2057486 0.2799665
#> [7,] 0.24539658 0.4973664 0.7630398 1.0509792
#> [8,] -0.36108294 -0.7320709 -1.1246002 -1.5535840
#> [9,] 0.12450233 0.1542956 0.1858601 0.2195980
```

Now, let us create a synthetic dataset `df_test`

and a synthetic test response vector `y_test`

to use in `makl_test()`

to check the results.

```
<- matrix(rnorm(600, 0, 1), nrow = 100)
df_test colnames(df_test) <- c("F1", "F2", "F3", "F4", "F5", "F6")
<- c()
y_test for(i in 1:nrow(df_test)) {
if((df_test[i, 2] + df_test[i, 3] + df_test[i, 4]) > 0) {
<- +1
y_test[i] else {
} <- -1
y_test[i]
}
}<-makl_test(X = df_test, y = y_test, makl_model = makl_model) result
```

The list `result`

contains two elements: 1) The predictions for the test response vector `y_test`

and 2) The area under the ROC curve (AUROC) versus the number of selected kernels values for each element in the `lambda_set`

if `CV`

is not applied; the area under the ROC curve versus the number of selected kernels value for the best `lambda`

in the `lambda_set`

if `CV`

is applied.

```
$auroc_kernel_number
result#> auroc_array n_selected_kernels
#> 0.9 0.9494179 1
#> 0.8 0.9494179 1
#> 0.7 0.9498193 1
#> 0.6 0.9498193 1
```