[R] Using caegorical variables in package randomForest.

Liaw, Andy andy_liaw at merck.com
Tue Mar 13 20:18:16 CET 2012


The way to represent categorical variables is with factors.  See ?factor.  randomForest() will handle factors appropriately, as most modeling functions in R.

Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of abhishek
> Sent: Tuesday, March 13, 2012 8:11 AM
> To: r-help at r-project.org
> Subject: [R] Using caegorical variables in package randomForest.
> 
> Hello,
> 
> I am sorry if there are already post that answers to this 
> question but i
> tried to find them before making this post. I did not really 
> find relevant
> posts.
> 
> I am using randomForest package for building a two class 
> classifier. There
> are categorical variables and numerical variables in my data. 
> Different
> categorical variables have different number of categories 
> from 2 to 10. I am
> not sure about how to represent the categorical data.
> For example, I am using 0 and 1 for variables that have only 
> two categories.
> But, i doubt, the program is analysing the values as 
> numerical. Do you have
> any idea how can i use the c*ategorical variables for 
> building a two class
> classifier.* I am using a factor consisting of 0 and 1 for the
> classification target.
> 
> Thank you for your ideas.
> 
> -----
> abhishek
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Using-caegorical-variables-in-pa
ckage-randomForest-tp4468923p4468923.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list