[R-sig-eco] handling ca
Gavin Simpson
gavin.simpson at ucl.ac.uk
Thu Aug 7 11:23:03 CEST 2008
On Wed, 2008-08-06 at 13:51 -0400, Griffith.Michael at epamail.epa.gov
wrote:
> In trying to use randomForest, I got the following error message:
>
> Error in data.matrix(x) : non-numeric data type in frame
>
> Am I correct that this means that randomForest has not be written in R
> to handle categorical predictor variables? Is there a way around this?
> I am working with two categorical variables (out of 4 predictor
> variables) with more than 2 levels that do not have any particular order
> to them. The instructions for the original random forest program by
> Brieman indicates that it handles categorical predictor variables, so I
> am surprised that the R version does not.
No, randomForest handles factors, as this simple example shows:
> dat <- data.frame(matrix(rnorm(1000), ncol = 10))
> dat$fac <- gl(4,25)
> head(dat)
X1 X2 X3 X4 X5 X6
1 -0.15048037 1.2497460 -0.7728316 -0.3286552 1.59056488 -1.2579715
2 -0.67688208 -2.0189794 -0.3154595 0.5998583 -1.89438803 -0.9737503
3 1.02637837 0.3724476 -0.3145720 1.4510331 1.78757305 0.4365752
4 0.08031081 0.6534088 -0.6211070 0.1432012 -0.51041876 -1.0198103
5 0.09208803 0.6273971 0.7333440 0.4362220 -0.03848859 0.6260701
6 -1.41415813 -1.1515418 -0.7457416 1.5853533 -1.17111942 2.5486069
X7 X8 X9 X10 fac
1 0.7698208 -1.8697214 -1.1568065 0.8459625 1
2 -0.2782257 0.1361337 -1.1308822 0.6001056 1
3 -1.0053869 0.5940746 -0.1833341 2.0251286 1
4 -0.9806460 -1.5225105 -1.8038346 0.2879445 1
5 -0.3767947 -1.8172355 1.1956810 1.2158483 1
6 -0.9316282 2.1180183 -0.6357269 -1.3134966 1
> dat$fac
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
[38] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[75] 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Levels: 1 2 3 4
> dat$resp <- rpois(100, 2)
> dat$resp <- rnorm(100, 2)
> forest <- randomForest(resp ~ ., data = dat)
>
As all you give is an error message we have very little to go on, but
the first thing I would check is that your data is as you think it
should be. factors can be converted to characters upon reading in data
to R, so that'd be my first port of call. What does:
str(mydata)
return, where mydata is the object that is your data.
Note the error comes from data.matrix. If you consult the help page for
that function you would see that this is the preferred way of converting
from a data frame to a matrix which preserves the numeric representation
of factors. Clearly there is something that is not a factor or numeric
in your data otherwise this standard R function would not have been
giving and error.
HTH
G
>
> Michael
>
> Michael B. Griffith, Ph.D.
> Research Ecologist
>
> USEPA, NCEA (MS A-110)
> 26 W. Martin Luther King Dr.
> Cincinnati, OH 45268
>
> telephone: 513 569-7034
> e-mail: griffith.michael at epa.gov
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-sig-ecology
mailing list