[R] using xval in mvpart to specify cross validation groups
andydolman at gmail.com
Fri Mar 12 23:05:37 CET 2010
Thank you Dennis, I've got the idea now.
However, a followup question to make sure I'm not wasting my time.
If I specify the precise CV folds to use, should I not get the same
tree every time?
e.g. here I have an hypothetical time sequence observed with error
from 3 sites 's'
If I specify to leave out 1 site each time in a 3-fold CV (leaving
aside that 3-fold cv might not be a good idea)
Should I not get the same tree each time?
y <- rep(sin(seq(0.1,6, 0.1)),3)
y1 <- y+rnorm(length(y), sd=0.5)
x <- rep(1:(length(y)/3),3)
s <- rep(1:3, each=(length(y)/3))
dat <- data.frame(x,y1,s)
(mvpart(y1~x, data=dat, xv="1se", xval=s))
Thank you for your help.
andydolman at gmail.com
On 12 March 2010 18:03, Dennis Murphy <djmuser at gmail.com> wrote:
> See inline...
> On Fri, Mar 12, 2010 at 4:15 AM, Andrew Dolman <andydolman at gmail.com> wrote:
>> Dear R's
>> I'm trying to use specific rather than random cross-validation groups
>> in mvpart.
>> The man page says:
>> xval Number of cross-validations or vector defining cross-validation
>> And I found this reply to the list by Terry Therneau from 2006
>> The rpart function allows one to give the cross-validation groups
>> So if the number of observations was 10, you could use
>> > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1))
>> which causes observations 1,2,7, and 10 to be left out of the first xval
>> sample, 3,4, and 9 out of the second, etc.
>> Terry Therneau
>> I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes
>> for observations 1,2,7,10 being left out of the 1st and so on.
>> x <- c(1,1,2,2,3,3,1,3,2,1)
>> which(x == 1) # elements left out of the first xval sample
>  1 2 7 10
>> which(x == 2) # elements left out of the second xval sample
>  3 4 9
>> which(x == 3) # elements left out of the third xval sample
>  5 6 8
> This vector is used to index a response vector/model matrix.
> To see how this is applied, consider the following. y is a vector of
> length 10, the same as x:
>> y <- rpois(10, 15)
>  12 15 17 11 14 14 12 12 16 16
>> y[x != 1] # first xval sample (y, y, y, y
>  17 11 14 14 12 16
>> y[x != 2] # second xval sample (y, y, y removed)
>  12 15 14 14 12 12 16
>> y[x != 3] # third xval sample (y, y, y removed)
>  12 15 17 11 12 16 16
> Indexing is one of the most important and powerful features of R.
>> Can anyone fill me in please?
>> andydolman at gmail.com
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help