[R] rcart - classification and regression trees (CART)
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Dec 16 14:44:06 CET 2009
Katie N wrote:
> Hi,
> I am trying to use CART to find an ideal cut-off value for a simple
> diagnostic test (ie when the test score is above x, diagnose the condition).
> When I put in the model
>
> fit=rpart(outcome ~ predictor1(TB144), method="class", data=data8)
>
> sometimes it gives me a tree with multiple nodes for the same predictor (see
> below for example of tree with 1 or multiple nodes). Is there a way to tell
> it to make only 1 node? Or is it safe to assume that the cut-off value on
> the primary node is the ideal cut-off?
>
> Thanks!
> Katie
>
> http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB144n.jpg
>
> http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB122n.jpg
>
>
Katie,
Do note that the strategy you are using is inconsistent with decision
theory. Optimal decisions have to condition on everything you know
about a single patient, and do not ask the question "to what group does
this patient belong?". For example, we estimate something given the
patient's age is 20 instead of given that her age is less than 60.
That's why logistic regression is used so frequently to estimate
probabilities of disease. Any cutoff that must be used has to be on the
predicted probability scale in order to get an optimum decision, and
that cutoff must be specified by the provider of the utility function.
Even then the cutoff is not fully trusted, e.g., a physician may order
another test as the last minute when the probability of disease is in a
gray zone.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list