[R] multi-class classification using rpart
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jan 26 05:26:14 CET 2005
On Tue, 25 Jan 2005, Liaw, Andy wrote:
>> From: WeiWei Shi
>>
>> Hi,
>> I am trying to make a multi-class classification tree by using rpart.
>> I used MASS package'd data: fgl to test and it works well.
>>
>> However, when I used my small-sampled data as below, the program seems
>> to take forever. I am not sure if it is due to slowness or there is
>> something wrong with my codes or data manipulation.
>>
>> Please be advised !
>>
>> The data is described as the output from str() function. The call to
>> rpart is like:
>>
>> library(rpart)
>> test_tree<-rpart(x$V142 ~ ., data=x,
>> parms=list(split='gini'), cp =0.01)
>>
>> the response variable is $V142, with 3 levels.
>>
>> Thanks for your suggestions!
>>
>> Ed.
>
> [snip]
>
>> $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59
>> 59 55 78 7 73 ...
>
> I'd bet this is the problem. There are 2^(88-1) - 1 possible ways to split
> a factor with 88 levels. It will work on those splits til the cows come
> home...
>
> I'd suggest getting rid of that variable, or collapse the levels to
> something more reasonable. The CART book describes some heuristic shortcuts
> for testing only n-1 splits for factors with n levels, but I believe that
> only works for 2-class problems, if I'm not mistaken.
You don't need heuristics: there is a fast algorithm (proved in my PRNN
book) for two classes only. I believe rpart implements it.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list