[R] modification of cross-validations in rpart

Weidong Gu anopheles123 at gmail.com
Mon Jul 4 23:05:22 CEST 2011


One way around hacking rpart is to write code to do K fold samples
based on unit outside rpart, then build trees using training sets and
summarize scores on testing sets.

Weidong Gu

On Mon, Jul 4, 2011 at 9:22 AM, Katerine Goyer <katerine.goyer at uqtr.ca> wrote:
>
>
>
>
>
>
>
> Hello,
>
>
>
> I am using
> the rpart function (from the rpart package) to do a regression tree that would describe
> the behaviour of a fish species according to several environmental variables.
> For each fish (sampling unit), I have repeated observations of the response
> variable, which means that the data are not independent. Normally, in this
> case, V-fold cross-validation needs to be modified to prevent over-optimistic
> predictions of error rates by cross-validation and overestimation of the tree
> size. A way to overcome this problem is by selecting only whole sampling units
> in our subsets of cross-validation. My problem is that I don’t know how to
> perform this modification of the cross-validation process in the rpart
> function.
>
>
> Is there a
> way to do this modification in rpart or is there any other function I could use
> that would consider interdependence in the response variable?
>
>
> Here is an
> example of the code I am using (“Y” being the response variable and “data.env”
> being a data frame of the environmental
> variables):
>
>
> Tree = rpart(Y
> ~ X1 + X2 + X3,xval=100,data=data.env)
>
>
>
> Thanks
>
> Katerine
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list