[R-pkgs] new packages: caret, caretLSF and caretNWS

Kuhn, Max Max.Kuhn at pfizer.com
Fri Oct 5 20:33:37 CEST 2007

Three more packages will be showing up on your mirror soon.

The caret package (short for "Classification And REgression Training")
aims to simplify the model building process. The package has functions

  - data splitting: balanced train/test splits, cross-validation and
bootstrapping sampling functions. There is also a function for maximum
dissimilarity sampling.

  - pre-processing: simple centering/scaling, filter methods for highly
correlated predictors, identification of linear combinations, removal of
"near zero variance" predictors and the "spatial-sign" transformation
function for predictors.

  - model building: the train function provides a common interface to 27
model types. Models can be tuned over complexity parameters using
resampling methods. A few functions also exist for plotting the results
from the tuning process.

  - bagged versions of mars (via the earth package) and fda models.

  - partial least squares classification model (based on the pls

  - yet another knn function (this one returns the vote proportions for
all the classes) based on the functions in MASS and ipred.

  - a variable importance class and methods for a variety of models
(e.g. trees, pls, mars, etc) in addition to model-free methods.

  - RMA-type normalization methods for oligo arrays that can be used on
a per sample basis. These functions are well suited for normalizing
chips individually using information from the training set samples.

Three vignettes come with the package and include several examples. A
few example data sets, mostly from quantitative structure-activity
relationship (QSAR) experiments, are also contained in the package.

The other two packages, caretLSF and caretNWS, provide alternate
versions of caret's train function that can be executed in parallel
using the Rlsf and nws packages, respectively. For example, if
bootstrapping is used to tune a model, the B models can be split over M
different nodes. For caretNWS, either the free nws package or the
commercial version (nwsPro) can be used. The commercial version offers
fault tolerance features (as well as support). Email
info at revolution-computing.com instead of me for more information about
nwsPro or nws.

Thanks to Steve Weston, Jed Wing and Andre Williams who contributed to
these packages.

Please send me emails at max dot kuhn at pfizer dot com for questions,
suggestions or bugs. 


More information about the R-packages mailing list