--- title: "Conventions for MLModels Implementation" author: "Brian J Smith" date: "2021-07-23" output: rmarkdown::html_vignette bibliography: bibliography.bib vignette: > %\VignetteIndexEntry{Conventions for MLModels Implementation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Model Constructor Components - `MLModel` is a function supplied by the **MachineShop** package. It allows for the integration of statistical and machine learning models supplied by other R packages with the **MachineShop** model fitting, prediction, and performance assessment tools. - The following are guidelines for writing model constructor functions that are wrappers around the `MLModel` function. - In this context, the term "constructor" refers to the wrapper function and "source package" to the package supplying the original model implementation. ### Constructor Arguments - The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments. - The source package defaults will be used for parameters with `NULL` values. - Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments. ### name Slot - Use the same name as the constructor. ### packages Slot - Include all external packages whose functions are called directly from within the constructor. - Use :: to reference source package functions. ### response_types Slot - Include all response variable types (`"binary"`, `"factor"`, `"matrix"`, `"numeric"`, `"ordered"`, and/or `"Surv"`) that can be analyzed with the model. ### weights Slot - Logical indicating whether the model supports case weights. ### params Slot - List of parameter values set by the constructor, typically obtained internally with `new_params(environment())` if all arguments are to be passed to the source package fit function as supplied. Additional steps may be needed to pass the constructor arguments to the source package in a different format; e.g., when some model parameters must be passed in a control structure, as in `C50Model` and `CForestModel`. ### fit Function - The first three arguments should be `formula`, `data`, and `weights` followed by an ellipsis (`...`). - If weights are not supported, the following, or equivalent, should be included in the function: ```{r eval = FALSE} if(!all(weights == 1)) warning("weights are not supported and will be ignored") ``` - Only add elements to the resulting fit object if they are needed and will be used in the `predict` or `varimp` functions. - Return the fit object. ### predict Function - The arguments are a model fit `object`, `newdata` frame, optionally `times` for prediction at survival time points, and an ellipsis. - The predict function should return a vector or column matrix of probabilities for the second level of binary factors, a matrix whose columns contain the probabilities for factors with more than two levels, a matrix of predicted responses if matrix, a vector or column matrix of predicted responses if numeric, a matrix whose columns contain survival probabilities at `times` if supplied, or a vector of predicted survival means if `times` are not supplied. ### varimp Function - Should have a single model fit `object` argument followed by an ellipsis. - Variable importance results should generally be returned as a vector with elements named after the corresponding predictor variables. The package will handle conversions to a data frame and `VariableImportance` object. If there is more than one set of relevant variable importance measures, they can be returned as a matrix or data frame with predictor variable names as the row names. ## Documenting an MLModel ### Model Parameters - Include the first sentences from the source package. - Start sentences with the parameter value type (logical, numeric, character, etc.). - Start sentences with lowercase. - Omit indefinite articles (a, an, etc.) from the starting sentences. ### Details Section - Include response types (binary, factor, matrix, numeric, ordered, and/or Surv). - Include the following sentence: > Default values for the \code{NULL} arguments and further model details can be > found in the source link below. ### Return (Value) Section - Include the following sentence: > MLModel class object. ### See Also Section - Include a link to the source package function and the other method functions shown below. ``` \code{\link[]{}}, \code{\link{fit}}, \code{\link{resample}} ``` ## Package Extensions - If adding a new model to the package, save its source code in a file whose name begins with "ML_" followed by the model name, and ending with a .R extension; e.g., `"R/ML_CustomModel.R"`. - Export the model in `NAMESPACE`. - Add any required packages to the "Suggests" section of `DESCRIPTION`. - Add the model to `R/models.R`. - Add the model to `R/modelinfo.R`. - Add a unit testing file to `tests/testthat`.