[R] rpart - classification and regression trees (CART)

Katie N knishimura at gmail.com
Mon Dec 14 16:28:14 CET 2009


Actually, that's the first thing I thought too, but they weren't listed in
that order in my model statement (model that I used is below):

fit=rpart(pres ~ TB144 + TB118 + TB129 + TB139 + TB114 + TB131 + TB122,
method="class", data=data8)

Would the selection of the best split when improvement is the same have
anything to do with the Gini Index?  I read on another site that the best
split is determined by the amount of homogeneity (or impurity as measured by
the Gini Index) resulting from a split (more homogeneity is better).  TB122
does have less variability (ie smaller standard deviation around the mean)
than the others, could that be why it was chosen despite having the same
"level of merit" as the other predictors?





Therneau, Terry M., Ph.D. wrote:
> 
> When two variables have exactly the same figure of merit, they will be
> listed in the output in the same order in which they appeared in your
> model statement.  
>    Terry Therneau
> 
> -- begin inclusion ---
> I had a question regarding the rpart command in R.  I used seven
> continuous
> predictor variables in the model and the variable called "TB122" was
> chosen
> for the first split.  But in looking at the output, there are 4
> variables
> that improve the predicted membership equally (TB122, TB139, TB144, and
> TB118) - output pasted below.
> 
> Node number 1: 268 observations,    complexity param=0.6
>   predicted class=0  expected loss=0.3
>     class counts:   197    71
>    probabilities: 0.735 0.265 
>   left son=2 (188 obs) right son=3 (80 obs)
>   Primary splits:
>       TB122 < 80  to the left,  improve=50, (0 missing)
>       TB139 < 90  to the left,  improve=50, (0 missing)
>       TB144 < 90  to the left,  improve=50, (0 missing)
>       TB118 < 90  to the left,  improve=50, (0 missing)
>       TB129 < 100 to the left,  improve=40, (0 missing)
> 
> --- end inclusion ---
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://n4.nabble.com/rpart-classification-and-regression-trees-CART-tp962680p963620.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list