[R] rcart - classification and regression trees (CART)

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Dec 16 14:44:06 CET 2009


Katie N wrote:
> Hi,
> I am trying to use CART to find an ideal cut-off value for a simple
> diagnostic test (ie when the test score is above x, diagnose the condition). 
> When I put in the model 
> 
> fit=rpart(outcome ~ predictor1(TB144), method="class", data=data8)
> 
> sometimes it gives me a tree with multiple nodes for the same predictor (see
> below for example of tree with 1 or multiple nodes).  Is there a way to tell
> it to make only 1 node?  Or is it safe to assume that the cut-off value on
> the primary node is the ideal cut-off?
> 
> Thanks!
> Katie
> 
> http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB144n.jpg 
> 
> http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB122n.jpg 
> 
> 

Katie,

Do note that the strategy you are using is inconsistent with decision 
theory.  Optimal decisions have to condition on everything you know 
about a single patient, and do not ask the question "to what group does 
this patient belong?".  For example, we estimate something given the 
patient's age is 20 instead of given that her age is less than 60. 
That's why logistic regression is used so frequently to estimate 
probabilities of disease.  Any cutoff that must be used has to be on the 
predicted probability scale in order to get an optimum decision, and 
that cutoff must be specified by the provider of the utility function. 
Even then the cutoff is not fully trusted, e.g., a physician may order 
another test as the last minute when the probability of disease is in a 
gray zone.

Frank
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list