[R] difference between createPartition and createfold functions

Max Kuhn mxkuhn at gmail.com
Mon Oct 3 17:44:47 CEST 2011


No, it is an argument to createFolds. Type ?createFolds to see the
appropriate syntax: "returnTrain	 a logical. When true, the values
returned are the sample positions corresponding to the data used
during training. This argument only works in conjunction with list =
TRUE"

On Mon, Oct 3, 2011 at 11:10 AM,  <bby2103 at columbia.edu> wrote:
> Hi Max,
>
> Thanks for the note. In your last paragraph, did you mean "in
> createDataPartition"? I'm a little vague about what returnTrain option does.
>
> Bonnie
>
> Quoting Max Kuhn <mxkuhn at gmail.com>:
>
>> Basically, createDataPartition is used when you need to make one or
>> more simple two-way splits of your data. For example, if you want to
>> make a training and test set and keep your classes balanced, this is
>> what you could use. It can also make multiple splits of this kind (or
>> leave-group-out CV aka Monte Carlos CV aka repeated training test
>> splits).
>>
>> createFolds is exclusively for k-fold CV. Their usage is simular when
>> you use the returnTrain = TRUE option in createFolds.
>>
>> Max
>>
>> On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> On Sun, Oct 2, 2011 at 3:54 PM,  <bby2103 at columbia.edu> wrote:
>>>>
>>>> Hi Steve,
>>>>
>>>> Thanks for the note. I did try the example and the result didn't make
>>>> sense
>>>> to me. For splitting a vector, what you describe is a big difference btw
>>>> them. For splitting a dataframe, I now wonder if these 2 functions are
>>>> the
>>>> wrong choices. They seem to split the columns, at least in the few
>>>> things I
>>>> tried.
>>>
>>> Sorry, I'm a bit confused now as to what you are after.
>>>
>>> You don't pass in a data.frame into any of the
>>> createFolds/DataPartition functions from the caret package.
>>>
>>> You pass in a *vector* of labels, and these functions tells you which
>>> indices into the vector to use as examples to hold out (or keep
>>> (depending on the value you pass in for the `returnTrain` argument))
>>> between each fold/partition of your learning scenario (eg. cross
>>> validation with createFolds).
>>>
>>> You would then use these indices to keep (remove) the rows of a
>>> data.frame, if that is how you are storing your examples.
>>>
>>> Does that make sense?
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>>
>> Max
>>
>>
>
>
>



-- 

Max



More information about the R-help mailing list