[R] Decision Tree: Am I Missing Anything?
Bhupendrasinh Thakre
vickythakre at gmail.com
Fri Sep 21 05:37:16 CEST 2012
Not very sure what the problem is as I was not able to take your data for run. You might want to use dput() command to present the data.
Now on the programming side. As we can see that we have more than 2 levels for the brands and hence method = class is not able to able to understand what you actually want from it.
Suggestion : For predictions having more than 2 levels I will go for Weka and specifically C4.5 algorithm. You also have the RWeka package for it.
Best Regards,
Bhupendrasinh Thakre
Sent from my iPhone
On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld <vikr at mindspring.com> wrote:
> I'm working with some data from which a client would like to make a decision tree predicting brand preference based on inputs such as price, speed, etc. After running the decision tree analysis using rpart, it appears that this data is not capable of predicting brand preference.
>
> Here's the data set:
>
> BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW
> Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
> Brand 2 0.8621 0.3793 0.8621 0.931 0.7586 0.6897 0.8966 0.9655 0.8276
> Brand 3 0.6 0.1 0.6 0.7 0.9 0.7 0.7 0.8 0.6
> Brand 4 0.6429 0.25 0.5714 0.5 0.6071 0.5 0.75 0.8214 0.5
> Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
> Brand 6 0.75 0.0833 0.5833 0.4167 0.5 0.4167 0.75 0.6667 0.5
> Brand 7 0.7742 0.4839 0.6129 0.5161 0.8065 0.6452 0.7742 0.9032 0.6129
> Brand 8 0.6429 0.2679 0.6964 0.7143 0.875 0.5536 0.8036 0.9464 0.6607
> Brand 9 0.575 0.175 0.65 0.55 0.625 0.375 0.825 0.85 0.475
> Brand 10 0.8095 0.5238 0.6667 0.6429 0.6667 0.5952 0.8571 0.8095 0.5714
> Brand 11 0.6308 0.3 0.6077 0.5846 0.6769 0.5231 0.7462 0.8846 0.6
> Brand 12 0.7212 0.3152 0.7152 0.6545 0.6606 0.503 0.8061 0.8909 0.6
> Brand 13 0.7419 0.2258 0.6129 0.5806 0.7097 0.6129 0.871 0.9677 0.3226
> Brand 14 0.7176 0.2706 0.6353 0.5647 0.6941 0.4471 0.7176 0.9412 0.5176
> Brand 15 0.7287 0.3437 0.5995 0.5788 0.8527 0.5478 0.8217 0.8941 0.6227
> Brand 16 0.7 0.4 0.6 0.4 1 0.4 0.9 0.9 0.5
> Brand 17 0.7193 0.3333 0.6667 0.6667 0.7018 0.5263 0.7719 0.8596 0.7018
> Brand 18 0.7778 0.4127 0.6508 0.6349 0.7937 0.6032 0.8571 0.9206 0.619
> Brand 19 0.8028 0.2817 0.6197 0.4366 0.7042 0.4366 0.7183 0.9155 0.5634
> Brand 20 0.7736 0.2453 0.6226 0.3774 0.5849 0.3019 0.717 0.8679 0.4717
> Brand 21 0.8481 0.2152 0.6329 0.4051 0.6329 0.4557 0.6962 0.8481 0.3418
> Brand 22 0.75 0.3333 0.6667 0.5 0.6667 0.5833 0.9167 0.9167 0.4167
>
> Here are my R commands:
>
>> test.df = read.csv("test.csv")
>> head(test.df)
> BRND PRI PROM FORM FAMI DRRE FREC MODE SPED REVW
> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
>
>> testTree = rpart(BRAND~PRI + PROM + FORM + FAMI+ DRRE + FREC + MODE + SPED + REVW, method="class", data=test.df)
>
>> printcp(testTree)
>
> Classification tree:
> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC +
> MODE + SPED + REVW, data = test.df, method = "class")
>
> Variables actually used in tree construction:
> [1] FORM
>
> Root node error: 21/22 = 0.95455
>
> n= 22
>
> CP nsplit rel error xerror xstd
> 1 0.047619 0 1.00000 1.0476 0
> 2 0.010000 1 0.95238 1.0476 0
>
> I note that only one variable (FORM) was actually used in tree construction. When I run a plot using:
>
>> plot(testTree)
>> text(testTree)
>
> ...I get a tree with one branch.
>
> It looks to me like I'm doing everything right, and this data is just not capable of predicting brand preference.
>
> Am I missing anything?
>
> Thanks very much in advance for any thoughts!
>
> -Vik
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list