Continuing on Section 3.4 of the vignette, we revisit the code used to obtain out-of-sample test error rates, and extend the analysis to a leave-one-out cross-validation (LOOCV) scheme.

The `iprior()`

function includes an argument to conveniently instruct which data samples should be used for training, and any remaining data used for testing. The out-of-sample test error rates would then be reported together. The examples in the vignette can then be conducted as follows:

```
data(tecator, package = "caret")
fat <- endpoints[, 2]
absorp <- -t(diff(t(absorp))) # take first differences
mod1 <- iprior(fat, absorp, train.samp = 1:172, method = "mixed")
```

```
## Running 5 initial EM iterations
## ===========================================================================
## Now switching to direct optimisation
## iter 10 value 222.772193
## final value 222.642108
## converged
```

The prediction error (training and test) can then be obtained easily:

```
## Training RMSE Test RMSE
## 2.890732 2.890353
```

With the above conveniences, it is easy to wrap this in loop to perform \(k\)-fold cross-validation; this is done in the `iprior_cv()`

function. We now analyse the predictive performance of I-prior models using a LOOCV scheme. For all `n=215`

samples, one observation pair is left out and the model trained; the prediction error is obtained for the observation that was left out. This is repeated for all `n=215`

samples, and the average of the prediction errors calculated.

For the linear RKHS, the code to peform the LOOCV in the `iprior`

package is as follows:

```
## Results from Leave-one-out Cross Validation
## Training RMSE: 2.869906
## Test RMSE : 2.331397
```

Notice the argument `folds = Inf`

—since the `iprior_cv()`

function basically performs a \(k\)-fold cross validation experiment, setting `folds`

to be equal to sample size or higher tells the function to perform LOOCV. Also note that the EM algorithm was used to fit the model, and the stopping criterion relaxed to `1e-2`

—this offered faster convergence without affecting predictive abilities. The resulting fit gives training and test mean squared error (MSE) for the cross-validation experiment.

The rest of the code for the remaining models are given below. As this takes quite a long time to run, this has been run locally and the results saved into the data `tecator.cv`

within the `iprior`

package.

```
mod2.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "poly2",
est.offset = TRUE, control = list(stop.crit = 1e-2))
mod3.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "poly3",
est.offset = TRUE, control = list(stop.crit = 1e-2))
mod4.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "fbm",
control = list(stop.crit = 1e-2))
mod5.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "fbm",
est.hurst = TRUE, control = list(stop.crit = 1e-2))
mod6.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "se",
est.lengthscale = TRUE, control = list(stop.crit = 1e-2))
# Function to tabulate the results
tecator_tab_cv <- function() {
tab <- t(sapply(list(mod1.cv, mod2.cv, mod3.cv, mod4.cv, mod5.cv, mod6.cv),
function(mod) {
res <- as.numeric(apply(sqrt(mod$mse[, -1]), 2, mean))
c("Training MSE" = res[1], "Test MSE" = res[2])
}))
rownames(tab) <- c("Linear", "Quadratic", "Cubic", "fBm-0.5", "fBm-MLE",
"SE-MLE")
tab
}
```

The results are tabulated below.

Training RMSE | Test RMSE | |
---|---|---|

Linear | 2.87 | 2.33 |

Quadratic | 2.98 | 2.66 |

Cubic | 2.97 | 2.64 |

fBm-0.5 | 0.09 | 0.50 |

fBm-MLE | 0.01 | 0.46 |

SE-MLE | 0.36 | 2.07 |