[R-pkgs] randomForest 4.3-0 released

Thu Jul 8 16:06:45 CEST 2004

Dear all,

Version 4.3-0 of the randomForest package is now available on CRAN (in
source; binaries will follow in due course).  There are some interface
changes and a few new features, as well as bug fixes.  For those who had
used previous versions, the important things to note are: 1. there's a
namespace now, and 2. some functions have been renamed.  The list of changes
since 4.0-7 (last public release) is shown below.

As many changes were made to the package, it's very likely that new bugs
have crept in.  I'd very much appreciate bug reports or even patches!

The plan is still to add features to the package so that it matches the
features in Breiman and Cutler's latest Fortran version.  There is also plan
to add some functions so that the package will work with Adele Cutler's Java
visualization program (RAFT).

Best,
Andy

====================================================
Changes in 4.3-0:

* Thanks to Adele Cutler, there's now casewise variable importance 
  measures in classification.  Similar feature is also added for 
  regression.  Use the new localImp option in randomForest().

* The `importance' component of randomForest object has been changed:  
  The permutation-based measures are not divided by their `standard 
  errors'.  Instead, the `standard errors' are stored in the 
  `importanceSD' component.  One should use the importance() extractor 
  function rather than something like rf.obj$importance for extracting 
  the importance measures.

* The importance() extractor function has been updated:  If the 
  permutation-based measures are available, calling importance() 
  with only a randomForest object returns the matrix of variable 
  importance measures.  There is the `scale' argument, which defaults 
  to TRUE.

* In predict.randomForest, there is a new argument `nodes' (default to 
  FALSE).  For classification, if nodes=TRUE, the returned object has an
  attribute `nodes', which is an n by ntree matrix of terminal node
  indicators.  This is ignored for regression.

Changes in 4.2-1:

* There is now a package name space.  Only generics are exported.

* Some function names have been changed: 
    partial.plot -> partialPlot
    var.imp.plot -> varImpPlot
    var.used     -> varUsed

* There is a new option `replace' in randomForest() (default to TRUE)
  indicating whether the sampling of cases is with or without
  replacement. 

* In randomForest(), the `sampsize' option now works for both
  classification and regression, and indicate the number of cases to be 
  drawn to grow each tree.  For classification, if sampsize is a vector of
  length the number of classes, then sampling is stratified by class.

* With the formula interface for randomForest(), the default na.action,	
  na.fail, is effective.  I.e., an error is given if there are NAs present
  in the data.  If na.omit is desired, it must be given explicitly.

* For classification, the err.rate component of the randomForest object
  (and the corresponding one for test set) now is a ntree by (nclass + 1)
  matrix, the first column of which contains the overall error rate, and
  the remaining columns the class error rates.  The running output now
  also prints class error rates.  The plot method for randomForest will
  plot the class error rates as well.

* The predict() method now checks whether the variable names in newdata 
  match those from the training data (if the randomForest object is not
  created from the formula interface).

* partialPlot() and varImpPlot() now have optional arguments xlab, ylab
  and main for more flexible labelling.  Also, if a factor is given as
  the variable, a real bar plot is produced.

* partialPlot() will now remove rows with NAs from the data frame given.

* For regression, if proximity=FALSE, an n by n array of integers is 
  erroneously allocated but not used (it's only used for proximity 
  calculation, so not needed otherwise).

* Updated combine() to conform to the new randomForest object.

* na.roughfix() was not working correctly for matrices, which in turns 
  causes problem in rfImpute().

Changes in 4.1-0:

* In randomForest(), if sampsize is given, the sampling is now done
  without replacement, in addition to stratified by class.  Therefore 
  sampsize can not be larger than the class frequencies.

* In classification randomForest, checks are added to avoid trees with 
  only the root node.

* Fixed a bug in the Fortran code for classification that caused segfault 
  on some system when encountering a tree with only root node.

* The help page for predict.randomForest() now states the fact that when 
  newdata is not specified, the OOB predictions from the randomForest 
  object is returned.

* plot.randomForest() and print.randomForest() were not checking for 
  existence of performance (err.rate or mse) on test data correctly.