[R] Random Forest Reading N/A's, I don't see them

William Dunlap wdunlap at tibco.com
Sat Dec 17 01:52:28 CET 2011


Try randomForest with a small dataset to see how it works:
  > d <- data.frame(stringsAsFactors=FALSE,
  +                 Num=(1:10)%%9,
  +                 Fac=factor(rep(LETTERS[1:2],each=5)),
  +                 Char=rep(letters[24:26],len=10))
  > randomForest(x=d[,"Char",drop=FALSE], y=d$Num)
  Error in randomForest.default(x = d[, "Char", drop = FALSE], y = d$Num) : 
    NA/NaN/Inf in foreign function call (arg 1)
  In addition: Warning message:
  In data.matrix(x) : NAs introduced by coercion
  > randomForest(x=d[,"Fac",drop=FALSE], y=d$Num)

  Call:
   randomForest(x = d[, "Fac", drop = FALSE], y = d$Num) 
                 Type of random forest: regression
                       Number of trees: 500
  No. of variables tried at each split: 1

            Mean of squared residuals: 9.573558
                      % Var explained: -40.58

It appears to die if any predictors are character vectors:
it will not convert them to factors (as most modelling functions
do).

as.matrix(data.frame) creates a character matrix if not all columns
are numeric or logical, so I suspect you are running into the
no-character-data limitation.  Try leaving off the as.matrix and
pass in the data.frame that it expects:
   randomForest(x=cm3[,-1,drop=FALSE], y=cm3[,1])
(The is no need or use for the data= argument if you use the x=,y=
interface.  It is only there for the formula interface.)

If you dislike the no-character-data limitation discuss it with
the person at the address given by maintainer("randomForest").

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lost in R
> Sent: Friday, December 16, 2011 2:55 PM
> To: r-help at r-project.org
> Subject: Re: [R] Random Forest Reading N/A's, I don't see them
> 
> The data set I attached was just those 10 lines. It was only meant to show
> any possible obvious mistake I may have made. The real set has the 4498 line
> of data.
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Reading-N-A-s-I-don-t-see-
> them-tp4201546p4206630.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list