[R] Chaid Decision Tree

Achim Zeileis Achim.Zeileis at uibk.ac.at
Mon Aug 22 19:24:48 CEST 2016


On Mon, 22 Aug 2016, MIKE DE LA HOZ wrote:

>
> Hi,
>
>
> I am running a chaid tree using titanic dataset (see attachment)
>
>
>
> setwd("C:/Users/miguel")
>
> titanic <- read.csv("train.csv")
> titanic.s <- subset( titanic, select = -c(PassengerId, Name ) )
>
> ctrl <- chaid_control(minsplit = 20, minbucket = 5, minprob = 0)
> chaidTitanic <- chaid(Survived ~ ., data = titanic, control = ctrl)
>
>
>
> It looks like I get the following error
>
> Error: is.factor(x) is not TRUE
>
>
>
> can you please help me here? I am not able to follow this type of error. if you can rewrite the sentence for me, It will be much appreciated

To be able to apply the chaid() function all variables (both response and 
predictor) need to be categorical variables, i.e., in R of class "factor".

It is not clear which variables are the culprits here because your example 
is not reproducible. I guess that there are at least some numeric 
regressor variables. Maybe the "Survived" response is also in numeric 
dummy coding rather than the appropriate "factor" variable.

In any case, I would recommend to use a tree model that can deal with both 
kinds of regressor variables. If you want something that selections split 
variables and split points based on statistical tests, ctree() from 
package "partykit" would be the obvious candidate.

>
> Thanks
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list