[R] rpart for CART with weights/priors

Fri May 7 18:14:44 CEST 2004

Hi,
I have a technical question about rpart:
according to Breiman et al. 1984, different costs for misclassification in
CART can be modelled 
either by means of modifying the loss matrix or by means of using different
prior probabilities for the classes, 
which again should have the same effect as using different weights for the
response classes.

What I tried was this:

library(rpart)
data(kyphosis)

#fit1 from original unweighted data set
fit1 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)

#modify loss matrix
loss<-matrix(c(0,1,2,0),nrow=2,ncol=2)

#   true class?
#    [,1] [,2]
#[1,]    0    2 
#[2,]    1    0 predicted class?

#modify priors
prior=c(1/3,2/3)

fit2<- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,
parms=list(loss=loss))
fit3 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,
parms=list(prior=prior))

fit2
fit3

par(mfrow=c(2,1))
plot(fit2)
text(fit2,use.n=T)
plot(fit3)
text(fit3,use.n=T)

#lead to similar but not identical trees (similar topology but different
cutoff points), 
#while all other combinations (even complete reversion, i.e. preference for
the other class) 
#lead to totally different trees...

#third approach using weights:
#sorting of data to design weight vector
ind<-order(kyphosis[,1])
kyphosis1<-kyphosis[ind,]

summary(kyphosis1[,1])
weight<-c(rep(1,64),rep(2,17))
summary(as.factor(weight))

fit4 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis1,
weights=weight)

#leads to result very similar to fit2 with
loss<-matrix(c(0,1,2,0),nrow=2,ncol=2)
#(same tree and cutoff points, but slightly different probabilities, maybe
numerical artefact?)

fit4
plot(fit4)
text(fit4,use.n=T)

#doule check with inverse loss matrix

loss<-matrix(c(0,1,2,0),nrow=2,ncol=2,byrow=T)
fit2<- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,
parms=list(loss=loss))

weight<-c(rep(2,64),rep(1,17))
fit4 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis1,
weights=weight)

fit2
fit4
#also same except for probabilities yprob

I don't see 
1. why the approach using prior probabilities doesn't work
2. what causes the differences in predicted probabilities in the weights
approach

Any idea? Thank You! C.

--