[R] multi-class classification using rpart

Liaw, Andy andy_liaw at merck.com
Tue Jan 25 20:58:04 CET 2005


> From: WeiWei Shi
> 
> Hi,
> I am trying to make a multi-class classification tree by using rpart.
> I used MASS package'd data: fgl to test and it works well.
> 
> However, when I used my small-sampled data as below, the program seems
> to take forever. I am not sure if it is due to slowness or there is
> something wrong with my codes or data manipulation.
> 
> Please be advised !
> 
> The data is described as the output from str() function. The call to
> rpart is like:
> 
> library(rpart)
> test_tree<-rpart(x$V142 ~ ., data=x, 
> parms=list(split='gini'), cp =0.01)
> 
> the response variable is $V142, with 3 levels.
> 
> Thanks for your suggestions!
> 
> Ed.

[snip]

>  $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59 
> 59 55 78 7 73 ...

I'd bet this is the problem.  There are 2^(88-1) - 1 possible ways to split
a factor with 88 levels.  It will work on those splits til the cows come
home...

I'd suggest getting rid of that variable, or collapse the levels to
something more reasonable.  The CART book describes some heuristic shortcuts
for testing only n-1 splits for factors with n levels, but I believe that
only works for 2-class problems, if I'm not mistaken.

Andy




More information about the R-help mailing list