[R] bug in rpart?

Uwe Ligges ligges at statistik.tu-dortmund.de
Fri May 22 19:43:57 CEST 2009



Yuanyuan wrote:
> Greetings,
> 
> I checked the Indian diabetes data again and get one tree for the data with
> reordered columns and another tree for the original data. I compared these
> two trees, the split points for these two trees are exactly the same but the
> fitted classes are not the same for some cases. And the misclassification
> errors are different too. I know how CART deal with ties --- even we are
> using the same data, the subjects to the left and right would not be the
> same if we just rearrange the order of covariates.
> 
> But the problem is, the fitted trees are exactly the same on the split
> points. Shouldn't we get the same fitted values if the decisions are the
> same at each step? Why the same structured trees have different observations
> on the nodes?


Because they may use different surrogate variables. Note that your data 
contain missing values that are handled by surrogates.

Best,
Uwe Ligges





> The source code for running the diabetes data example and the output of
> trees are attached. Your professional opinion is very much appreciated.
> 
> library(mlbench)
> data(PimaIndiansDiabetes2)
> mydata<-PimaIndiansDiabetes2
> library(rpart)
> fit2<-rpart(diabetes~., data=mydata,method="class")
> plot(fit2,uniform=T,main="CART for original data")
> text(fit2,use.n=T,cex=0.6)
> printcp(fit2)
> table(predict(fit2,type="class"),mydata$diabetes)
> ## misclassifcation table: rows are fitted class
>       neg pos
>   neg 437  68
>   pos  63 200
> 
> 
> pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)])
> fit3<-rpart(diabetes~., data=pmydata,method="class")
> plot(fit3,uniform=T,main="CART after exchaging mass & glucose")
> text(fit3,use.n=T,cex=0.6)
> printcp(fit3)
> table(predict(fit3,type="class"),pmydata$diabetes)
> ##after exchage the order of BODY mass and PLASMA glucose
>       neg pos
>   neg 436  64
>   pos  64 204
> 
> 
> Best,
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list