[R] Question about rpart decision trees (being used to predict customer churn)

Carlos J. Gil Bellosta cgb at datanalytics.com
Sat Aug 1 21:24:06 CEST 2009


Hello,

If you do

my.tree <- rpart(cancel ~ experience)

and then you check

my.tree$frame

you will note that the complexity parameter there is 0. 

Check ?rpart.object to get a description of what this output means. But
essentially, you will not be able to break the leaf unless you set a
complexity parameter below that value, this is, never.

You may need to go into the internals of the function (and the C code)
in order to understand how this parameter is calculated. It looks to me
as an oddity and it is worth trying to understand why. 

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


P.S.: Note that there is a bug in your submitted code that requires some
hand fixing.



On Sun, 2009-07-26 at 11:37 -0700, Robert Smith wrote:
> Hi,
> 
> I am using rpart decision trees to analyze customer churn. I am finding that
> the decision trees created are not effective because they are not able to
> recognize factors that influence churn. I have created an example situation
> below. What do I need to do to for rpart to build a tree with the variable
> experience? My guess is that this would happen if rpart used the loss matrix
> while creating the tree.
> 
> > experience <- as.factor(c(rep("good",90), rep("bad",10)))
> > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5),
> rep("yes",5)))
> > table(experience, cancel)
>           cancel
> experience no yes
>       bad   5   5
>       good 85   5
> > rpart(cancel ~ experience)
> n= 100
> node), split, n, loss, yval, (yprob)
>       * denotes terminal node
> 1) root 100 10 no (0.9000000 0.1000000) *
> 
> I tried the following commands with no success.
> rpart(cancel ~ experience, control=rpart.control(cp=.0001))
> rpart(cancel ~ experience, parms=list(split='information'))
> rpart(cancel ~ experience, parms=list(split='information'),
> control=rpart.control(cp=.0001))
> rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,10000,0), nrow=2,
> ncol=2)))
> 
> Thanks a lot for your help.
> 
> Best regards,
> Robert
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list