[R] Rpart query

Achim Zeileis Achim.Zeileis at uibk.ac.at
Tue Oct 12 11:47:52 CEST 2010


On Mon, 11 Oct 2010, jagdeesh_mn wrote:

>
> Hi,
>
> Being a novice this is my first usage of R.
>
> I am trying to use rpart for building a decision tree in R. And I have the
> following dataframe
>
>
> Outlook	Temp	Humidity	Windy	Class
> Sunny	75	70	Yes	Play
> Sunny	80	90	Yes	Don't Play
> Sunny	85	85	No	Don't Play
> Sunny	72	95	No	Don't Play
> Sunny	69	70	No	Play
> Overcast	72	90	Yes	Play
> Overcast	83	78	No	Play
> Overcast	64	65	Yes	Play
> Overcast	81	75	No	Play
>
> The first line indicating the header. When I use the formula,
>
> "CART<-rpart(Class ~ Outlook + Temp + Humidity + Windy, data=dataframe)"
>
> and trying to plot the values of CART using plot(CART), I get the following
> error,
>
> "Error in plot.rpart(CART) : fit is not a tree, just a root".
>
> Am I missing something here? Any help would be greatly appreciated. Btw, the
> dataframe was obtained by reading a csv which shouldn't be an issue.

The error message says it all: In this tiny data set rpart() decides that 
it doesn't split the data at all and thus just retains a root and not a 
tree.

If you want to make rpart() split the data, you can modify some of its 
hyperparameters, e.g., the minimum number of observations required to 
attempt a split.

The data above are often used in machine learning textbooks to introduce 
the concept of recursive partitioning. They are also provided in the 
"RWeka" package. However, many (statistical) recursive partitioning 
algorithms will be default consider the data too small to attempt 
splitting.

## load RWeka and data
library("RWeka")
weather <- read.arff(system.file("arff", "weather.arff",
   package = "RWeka"))

## J4.8 tree (Java implementation of C4.5, revision 8)
j48 <- J48(play ~ ., data = weather)
j48

## RPart tree (R implementation of CART)
library("rpart")
rp <- rpart(play ~ ., data = weather, minsplit = 5)
plot(rp)
text(rp)

## Conditional inference tree
library("party")
ct <- ctree(play ~ ., data = weather,
   control = ctree_control(minsplit = 5, mincriterion = 0.3))
plot(ct)

As you see, all trees have different opinions about how the data should be 
split. However, in this tiny data set, nothing could be considered 
statistically significant.

I would recommend to use some larger data set to try to understand how the 
different algorithms work.

hth,
Z

> -Jagdeesh
>
>
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Rpart-query-tp2991198p2991198.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list