[R] rpart - I'm confused by the loss matrix

Thu Nov 9 17:03:27 CET 2006

Hello,

As I couldn't find anywhere in the help to rpart which element in the
loss matrix means which loss, I played with this parameter and became
a bit confused.
What I did was this:
I used kyphosis data(classification absent/present, number of 'absent'
cases is 64, of 'present' cases 17)
and I tried the following

> lmat=matrix(c(0,17,64,0),ncol=2)
> lmat
     [,1] [,2]
[1,]    0   64
[2,]   17    0

> set.seed(1003)
> fit1<-rpart(Kyphosis~.,data=kyphosis,parms=list(loss=lmat))

> set.seed(1003)
> fit2<-rpart(Kyphosis~.,data=kyphosis,parms=list(prior=c(0.5,0.5)))

The results I obtained were identical, so I concluded that the losses were
[L(true, predicted)]:

L(absent,present)=17
L(present,absent)=64.

And thus the arrangement of the elements in the loss matrix seemed
clear as absent is considered as class 1 and present as class 2 and my
problem seemed to be solved. However, I tried also this:

>residuals(fit1)

and became confused. Because for each misclassified 'absent' the
residual(which should be loss in this case) was 64, while for a
misclassified 'present' it was 17 (in contradiction to the previous.)

So am I wrong somewhere? Is the arrangement of elements in the loss
matrix such as I deduced it from fitting fit1 and fit2?

Thanks for any comments.

Barbora