[R] CHAID in R
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Fri Nov 15 23:38:11 CET 2013
On Sat, 16 Nov 2013, Preetam Pal wrote:
> Hi,
>
> I have a data set on credit rating for customers in a bank (Rating is 1
> for defaulter, 0 = non-defaulter). I have 10 predictor variables
> (C1,C2,.....,C10) . I want to build a CHAID Tree using R for
> classification. How do I do this? For your perusal, the data set is
> attached. Thanks in advance.
The classical CHAID algorithm is implemented in a package on R-Forge:
https://R-Forge.R-project.org/R/?group_id=343
However, this only supports categorical covariates and hence is not useful
for your data.
Alternatively, you might want to try out other packages for learning
classification trees, e.g., partykit or rpart. See also
http://CRAN.R-project.org/view=MachineLearning
For your data you could do:
## read data with factor response
d <- read.table("text.txt", header = TRUE)
d$Rating <- factor(d$Rating)
## ctree
library("partykit")
ct <- ctree(Rating ~ ., data = d)
plot(ct)
## rpart
library("rpart")
rp <- rpart(Rating ~ ., data = d, control = list(cp = 0.02))
plot(as.party(rp))
## evtree
library("evtree")
set.seed(1)
ev <- evtree(Rating ~ ., data = d, maxdepth = 5)
plot(ev)
All methods agree that the decisive split is in C2 at about -110. And
possibly you might be able to infer some more splits for the < -110
subsample but there the methods disagree somewhat.
Best,
Z
> -Preetam
>
> --
> Preetam Pal
> (+91)-9432212774
> M-Stat 2nd Year, Room No. N-114
> Statistics Division, C.V.Raman
> Hall
> Indian Statistical Institute, B.H.O.S.
> Kolkata.
>
More information about the R-help
mailing list