[R] Using cv.tree to assign cases to specific cv-groups

jshuter at uoguelph.ca jshuter at uoguelph.ca
Fri Feb 8 22:07:34 CET 2008


Hello,

I would like to use cv.tree to run a 10-fold cross-validation  
experiment on a tree object to help me choose a tree size.

Many users seem to allow their cases to be assigned to CV groups  
randomly, but I have assigned each case to one of 10 cv groups, such  
that the data from each of my experimental units is included in only  
one cv-group.

According to the manual for the tree Package (Ripley 2007), the  
cv.tree argument "rand" [cv.tree(object, rand, FUN = prune.tree,  
K=10)], allows the user the option to specify an “integer vector of  
the length the number of cases used to create object, assigning the  
cases to different groups for cross-validation” (Ripley 2007).  
However, after searching the R-archives and various online sources, I  
have been unable to find an example of code in which someone has  
exercised this option, so I am unsure how to proceed.

Specifically, should I:

1. Create a 1 column dataframe, with each case containing a number  
from 1-10, with the order corresponding to the order of cases in the  
original dataset used to generate the tree object.

2.Call that dataset using the “rand” argument when I run the full  
syntax for cv.tree

OR should I:

1.List the integers used for case assignment directly in the syntax  
for cv.tree, following the “rand” argument?

If anyone has any experience using cv.tree (or another function) to  
assign specific cv-groups, any advice would be greatly appreciated!

Jen Shuter
University of Guelph



More information about the R-help mailing list