[R] cross-validation in rpart
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Mar 19 16:08:48 CET 2011
On Sat, 19 Mar 2011, Penny B wrote:
> I am trying to find out what type of sampling scheme is used to select the 10
> subsets in 10-fold cross-validation process used in rpart to choose the best
> tree. Is it simple random sampling? Is there any documentation available on
> this?
Not SRS (and least in its conventional meaning), as it is
partitioning: the 10 folds are disjoint.
Note that this happens in two places, in rpart() and in xpred.rpart(),
but the (default) method is the same. I presume you asked about the
first, but it wasn't clear.
There is a lot of documentation on the meaning of '10-fold
cross-validation', e.g. in my 1996 book. There are a few slightly
different ways to do it, and you can read the rpart sources if you
want to know the details.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list