[R] multi-class classification using rpart

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jan 26 05:26:14 CET 2005


On Tue, 25 Jan 2005, Liaw, Andy wrote:

>> From: WeiWei Shi
>>
>> Hi,
>> I am trying to make a multi-class classification tree by using rpart.
>> I used MASS package'd data: fgl to test and it works well.
>>
>> However, when I used my small-sampled data as below, the program seems
>> to take forever. I am not sure if it is due to slowness or there is
>> something wrong with my codes or data manipulation.
>>
>> Please be advised !
>>
>> The data is described as the output from str() function. The call to
>> rpart is like:
>>
>> library(rpart)
>> test_tree<-rpart(x$V142 ~ ., data=x,
>> parms=list(split='gini'), cp =0.01)
>>
>> the response variable is $V142, with 3 levels.
>>
>> Thanks for your suggestions!
>>
>> Ed.
>
> [snip]
>
>>  $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59
>> 59 55 78 7 73 ...
>
> I'd bet this is the problem.  There are 2^(88-1) - 1 possible ways to split
> a factor with 88 levels.  It will work on those splits til the cows come
> home...
>
> I'd suggest getting rid of that variable, or collapse the levels to
> something more reasonable.  The CART book describes some heuristic shortcuts
> for testing only n-1 splits for factors with n levels, but I believe that
> only works for 2-class problems, if I'm not mistaken.

You don't need heuristics: there is a fast algorithm (proved in my PRNN 
book) for two classes only.  I believe rpart implements it.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list