[R] difference between createPartition and createfold functions
bby2103 at columbia.edu
bby2103 at columbia.edu
Mon Oct 3 17:10:57 CEST 2011
Thanks for the note. In your last paragraph, did you mean "in
createDataPartition"? I'm a little vague about what returnTrain option
Quoting Max Kuhn <mxkuhn at gmail.com>:
> Basically, createDataPartition is used when you need to make one or
> more simple two-way splits of your data. For example, if you want to
> make a training and test set and keep your classes balanced, this is
> what you could use. It can also make multiple splits of this kind (or
> leave-group-out CV aka Monte Carlos CV aka repeated training test
> createFolds is exclusively for k-fold CV. Their usage is simular when
> you use the returnTrain = TRUE option in createFolds.
> On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> On Sun, Oct 2, 2011 at 3:54 PM, <bby2103 at columbia.edu> wrote:
>>> Hi Steve,
>>> Thanks for the note. I did try the example and the result didn't make sense
>>> to me. For splitting a vector, what you describe is a big difference btw
>>> them. For splitting a dataframe, I now wonder if these 2 functions are the
>>> wrong choices. They seem to split the columns, at least in the few things I
>> Sorry, I'm a bit confused now as to what you are after.
>> You don't pass in a data.frame into any of the
>> createFolds/DataPartition functions from the caret package.
>> You pass in a *vector* of labels, and these functions tells you which
>> indices into the vector to use as examples to hold out (or keep
>> (depending on the value you pass in for the `returnTrain` argument))
>> between each fold/partition of your learning scenario (eg. cross
>> validation with createFolds).
>> You would then use these indices to keep (remove) the rows of a
>> data.frame, if that is how you are storing your examples.
>> Does that make sense?
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help