[R] randomForest and ordered factors

Liaw, Andy andy_liaw at merck.com
Tue Apr 29 16:29:19 CEST 2008


If you are using the latest version (4.5-25), you will see in rfNews() that that's the problem I need to fix.  The package was able to handle ordered factors, but some more stringent checks for factor levels consistency introduced in 4.5-23 broke the support for ordered factors in prediction.  

>From the code you've shown, it looks like you are just growing the forest to evaluate variable importance or other things, instead of predicting other data (since you set keep.forest=FALSE).  If that's the case, you should be fine, as the problem only happens when you try to call predict() with models that contain ordered factors as predictors.

(Ordered factors are basically treated as numerics in RF: trees only make use of ranks for numeric variables, so there's basically no difference between ordered factors and numeric variables as predictors.)

Andy 

From: Birgit Lemcke
> 
> Hello R-user!
> 
> I am running R 2.7.0 on a Power Book (Tiger). (I am still R and  
> statistics beginner)
> 
> I try to find the most important variables to divide my dataset as  
> given in a categorical variable.
> 
> code:
> 
> Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4,  
> importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000,  
> keep.forest=FALSE)
> 
> My dataset contains also ordered factors classified as such.
> Is randomForest able to deal with it, does it change anything or is  
> there no difference in using factors or ordered factors?
> 
> Many thanks in advance
> 
> B.
> 
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> birgit.lemcke at systbot.uzh.ch
> 
> 175 Jahre UZH
> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> MNF-Jubiläumsevent für gross und klein.
> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list