[R] rpart - I'm confused by the loss matrix
Barbora Arendacká
barendacka at gmail.com
Thu Nov 9 17:03:27 CET 2006
Hello,
As I couldn't find anywhere in the help to rpart which element in the
loss matrix means which loss, I played with this parameter and became
a bit confused.
What I did was this:
I used kyphosis data(classification absent/present, number of 'absent'
cases is 64, of 'present' cases 17)
and I tried the following
> lmat=matrix(c(0,17,64,0),ncol=2)
> lmat
[,1] [,2]
[1,] 0 64
[2,] 17 0
> set.seed(1003)
> fit1<-rpart(Kyphosis~.,data=kyphosis,parms=list(loss=lmat))
> set.seed(1003)
> fit2<-rpart(Kyphosis~.,data=kyphosis,parms=list(prior=c(0.5,0.5)))
The results I obtained were identical, so I concluded that the losses were
[L(true, predicted)]:
L(absent,present)=17
L(present,absent)=64.
And thus the arrangement of the elements in the loss matrix seemed
clear as absent is considered as class 1 and present as class 2 and my
problem seemed to be solved. However, I tried also this:
>residuals(fit1)
and became confused. Because for each misclassified 'absent' the
residual(which should be loss in this case) was 64, while for a
misclassified 'present' it was 17 (in contradiction to the previous.)
So am I wrong somewhere? Is the arrangement of elements in the loss
matrix such as I deduced it from fitting fit1 and fit2?
Thanks for any comments.
Barbora
More information about the R-help
mailing list