rfPermute estimates the significance of importance
metrics for a Random Forest model by permuting the response variable. It
will produce null distributions of importance metrics for each predictor
variable and p-values of observed importances. The package also
includes several summary and visualization functions for
randomForest and rfPermute results. See
rfPermuteTutorial() in the package for a guide on running,
summarizing, and diagnosing rfPermute and
randomForest models.
To install the stable version from CRAN:
install.packages('rfPermute')To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
devtools::install_github('EricArcher/rfPermute')rfPermute Estimate Permutation p-values for Random
Forest Importance Metricsimportance Extract rfPermute Importance Scores and
p-valuesplotNull Plot Random Forest Importance Null
DistributionsplotImpPreds Distribution of Important Variablessummary Summarize rfPermute and randomForest
modelsconfusionMatrix Confusion MatrixcasePredictions Return predictions and votes for
training casespctCorrect Percent Correctly ClassifiedplotInbag Distribution of sample inbag ratesplotPredictedProbs Distribution of prediction
assignment probabilitiesplotProximity Plot Random Forest Proximity ScoresplotTrace Trace of cumulative error rates in
forestplotVotes Vote DistributioncombineRP Combine rfPermute modelsbalancedSampsize Balanced Sample SizecleanRFdata Clean Random Forest Input Datan predictors.pct.correct argument to plotTrace().
Default is now to have y-axis as 1 - OOB error rate.NOTE: v2.5 is a large redevelopment of the package.
The structure of rfPermute model objects has changed make them
incompatible with previous versions. Also, the name and functionality of
several functions has changed to make them more consistent with one
another. A tutorial (under construction) is available within the package
as rfPermuteTutorial().
exptdErrRatethreshold argument in
classConfInt and confusionMatrix to
NULLexptdErrRate and
confusionMatrixpctCorrectcasePredictionsplotConfMat, plotOOBtimes,
plotRFtrace, and plotInbag, and
plotImpVarDist visualizations.confusionMatrix so it will work when
randomForest model doesn’t have a $confusion
element, like when model is result of combine-ing multiple
models.num.cores to
NULL.type argument to plotVotes to choose
between area and bar charts.plot.rfPermute to plotNull to
avoid clashes and maintain functionality of
randomForest::plot.randomForest.proximity.plot to
proximityPlot, exptd.err.rate to
exptdErrRate, and clean.rf.data to
cleanRFdata to make camelCase naming scheme more consistent
in package.plotNull from base graphics to
ggplot2.symb.metab data set.n argument to impHeatmap.classConfInt,
confusionMatrix, plotVotes,
pctCorrect.plot.rfPermute that was reporting the
p-value incorrectly at the top of the figure.rfPermute so it works on
Windows too.impHeatmap function.proximity.plot to use ggplot2
graphics.rfPemute has separate $null.dist and
$pval elements, each with results for unscaled and scaled
importance mesures. See ?rfPermute for more
information.rp.importance and plot.rfPermute now take
a scale argument to specify whether or not importance
values should be scaled by standard deviations.nrep = 0 for rfPermute, a
randomForest object is returned.grid name
clashes.clean.rf.data where fixed
predictors were not removed.main argument in
plot.rp.importance.num.cores argument to rfPermute to
take advantage of multi-threadingcalc.imp.pval to keep it from
indexing