[R] Rpart query
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Tue Oct 12 11:47:52 CEST 2010
On Mon, 11 Oct 2010, jagdeesh_mn wrote:
>
> Hi,
>
> Being a novice this is my first usage of R.
>
> I am trying to use rpart for building a decision tree in R. And I have the
> following dataframe
>
>
> Outlook Temp Humidity Windy Class
> Sunny 75 70 Yes Play
> Sunny 80 90 Yes Don't Play
> Sunny 85 85 No Don't Play
> Sunny 72 95 No Don't Play
> Sunny 69 70 No Play
> Overcast 72 90 Yes Play
> Overcast 83 78 No Play
> Overcast 64 65 Yes Play
> Overcast 81 75 No Play
>
> The first line indicating the header. When I use the formula,
>
> "CART<-rpart(Class ~ Outlook + Temp + Humidity + Windy, data=dataframe)"
>
> and trying to plot the values of CART using plot(CART), I get the following
> error,
>
> "Error in plot.rpart(CART) : fit is not a tree, just a root".
>
> Am I missing something here? Any help would be greatly appreciated. Btw, the
> dataframe was obtained by reading a csv which shouldn't be an issue.
The error message says it all: In this tiny data set rpart() decides that
it doesn't split the data at all and thus just retains a root and not a
tree.
If you want to make rpart() split the data, you can modify some of its
hyperparameters, e.g., the minimum number of observations required to
attempt a split.
The data above are often used in machine learning textbooks to introduce
the concept of recursive partitioning. They are also provided in the
"RWeka" package. However, many (statistical) recursive partitioning
algorithms will be default consider the data too small to attempt
splitting.
## load RWeka and data
library("RWeka")
weather <- read.arff(system.file("arff", "weather.arff",
package = "RWeka"))
## J4.8 tree (Java implementation of C4.5, revision 8)
j48 <- J48(play ~ ., data = weather)
j48
## RPart tree (R implementation of CART)
library("rpart")
rp <- rpart(play ~ ., data = weather, minsplit = 5)
plot(rp)
text(rp)
## Conditional inference tree
library("party")
ct <- ctree(play ~ ., data = weather,
control = ctree_control(minsplit = 5, mincriterion = 0.3))
plot(ct)
As you see, all trees have different opinions about how the data should be
split. However, in this tiny data set, nothing could be considered
statistically significant.
I would recommend to use some larger data set to try to understand how the
different algorithms work.
hth,
Z
> -Jagdeesh
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Rpart-query-tp2991198p2991198.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list