[R-sig-eco] handling ca

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Aug 7 11:23:03 CEST 2008


On Wed, 2008-08-06 at 13:51 -0400, Griffith.Michael at epamail.epa.gov
wrote:
> In trying to use randomForest, I got the following error message:
> 
> Error in data.matrix(x) : non-numeric data type in frame
> 
> Am I correct that this means that randomForest  has not be written in R
> to handle categorical predictor variables?  Is there a way around this?
> I am working with two categorical variables (out of 4 predictor
> variables) with more than 2 levels that do not have any particular order
> to them.  The instructions for the original random forest program by
> Brieman indicates that it handles categorical predictor variables, so I
> am surprised that the R version does not.

No, randomForest handles factors, as this simple example shows:

> dat <- data.frame(matrix(rnorm(1000), ncol = 10))
> dat$fac <- gl(4,25)
> head(dat)
           X1         X2         X3         X4          X5         X6
1 -0.15048037  1.2497460 -0.7728316 -0.3286552  1.59056488 -1.2579715
2 -0.67688208 -2.0189794 -0.3154595  0.5998583 -1.89438803 -0.9737503
3  1.02637837  0.3724476 -0.3145720  1.4510331  1.78757305  0.4365752
4  0.08031081  0.6534088 -0.6211070  0.1432012 -0.51041876 -1.0198103
5  0.09208803  0.6273971  0.7333440  0.4362220 -0.03848859  0.6260701
6 -1.41415813 -1.1515418 -0.7457416  1.5853533 -1.17111942  2.5486069
          X7         X8         X9        X10 fac
1  0.7698208 -1.8697214 -1.1568065  0.8459625   1
2 -0.2782257  0.1361337 -1.1308822  0.6001056   1
3 -1.0053869  0.5940746 -0.1833341  2.0251286   1
4 -0.9806460 -1.5225105 -1.8038346  0.2879445   1
5 -0.3767947 -1.8172355  1.1956810  1.2158483   1
6 -0.9316282  2.1180183 -0.6357269 -1.3134966   1
> dat$fac
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [75] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4
> dat$resp <- rpois(100, 2)
> dat$resp <- rnorm(100, 2)
> forest <- randomForest(resp ~ ., data = dat)
>

As all you give is an error message we have very little to go on, but
the first thing I would check is that your data is as you think it
should be. factors can be converted to characters upon reading in data
to R, so that'd be my first port of call. What does:

str(mydata)

return, where mydata is the object that is your data.

Note the error comes from data.matrix. If you consult the help page for
that function you would see that this is the preferred way of converting
from a data frame to a matrix which preserves the numeric representation
of factors. Clearly there is something that is not a factor or numeric
in your data otherwise this standard R function would not have been
giving and error.

HTH

G

> 
> Michael
> 
> Michael B. Griffith, Ph.D.
> Research Ecologist
> 
> USEPA, NCEA (MS A-110)
> 26 W. Martin Luther King Dr.
> Cincinnati, OH  45268
> 
> telephone:  513 569-7034
> e-mail:  griffith.michael at epa.gov
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list