[R] Problem with Random Forest predict

Liaw, Andy andy_liaw at merck.com
Fri May 1 19:27:23 CEST 2009


This message landed in the "Junk e-mails" folder (of which I have no
control), and it just so happens today that I glanced in the folder
today, instead of just emptying it without checking, trusting the filter
to do the Right Thing...

Since you seem to run into preblem with predict.randomForest, one thing
you can do to investigate further is to make a copy of it and play with
that copy; i.e.,

myRFpred <- randomForest:::predict.randomForest

Then you can do debug(myRFpred) and step through the function call, and
examine the objects inside the function to see where it went astray.

Basically, there's little I can do unless I have a way to reproduce the
problem you see, and currently I don't.  If you can supply a (hopefully
small) portion of the data that cause the problem, then I can try to see
what's wrong.  

Best,
Andy

From: Michael Conklin
> 
> I am trying to run a partialPlot with Random Forest (as I 
> have done many times before).
> 
> First I run my forest... Cell is a 6 level factor that is the 
> dependent variable - all other variables are predictors, most 
> of these are factors as well.
> 
> predCell<-randomForest(x=tempdata[-match("Cell",names(tempdata
> ))],y=tempdata$Cell,importance=T)
> 
> Then I try my partial plot to look at the effect of a 
> specific predictor.
> 
> partialPlot(x=predCell,pred.data=tempdata[-match("Cell",names(
> tempdata))],x.var="P7_6")
> 
> I get this error:
> 
> Error in predict.randomForest(x, x.data, type = "prob") :
>   Type of predictors in new data do not match that of the 
> training data.
> 
> In examining randomForest:::predict.randomForest I see the 
> following code that produces this error message.
> 
>         cat.new <- sapply(x, function(x) if (is.factor(x) &&
>             !is.ordered(x))
>             length(levels(x))
>         else 1)
>         if (!all(object$forest$ncat == cat.new))
>             stop("Type of predictors in new data do not match 
> that of the training data.")
>     }
> 
> 
> The odd thing is that if I run this code outside of the function:
> 
> > all(predCell$forest$ncat==
> + sapply(tempdata[-match("Cell",names(tempdata))], 
> function(x) if (is.factor(x) &&
> +             !is.ordered(x))
> +             length(levels(x))
> +         else 1))
> [1] TRUE
> 
> Which should avoid the "stop" function.
> 
> Here is the session info.
> 
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] randomForest_4.5-30
> >
> 
> Any ideas would be greatly appreciated.
> 
> W. Michael Conklin
> Chief Methodologist
> 
> MarketTools, Inc. | www.markettools.com<http://www.markettools.com>
> 6465 Wayzata Blvd | Suite 170 |  St. Louis Park, MN 55426.  
> PHONE: 952.417.4719 | CELL: 612.201.8978
> This email and attachment(s) may contain confidential and/or 
> proprietary information and is intended only for the intended 
> addressee(s) or its authorized agent(s). Any disclosure, 
> printing, copying or use of such information is strictly 
> prohibited. If this email and/or attachment(s) were received 
> in error, please immediately notify the sender and delete all copies
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list