[R] Re-evaluating the tree in the random forest

Fri Sep 9 15:39:49 CEST 2005

Here's an example, using the iris data:

> ## Grow one tree, using all data, and try all variables at all splits,
> ## using large nodesize to get smaller tree.
> iris.rf <- randomForest(iris[-5], iris[[5]], ntree=1, nodesize=20, mtry=4,
+                         sampsize=150, replace=FALSE)
> getTree(iris.rf, 1)
   left daughter right daughter split var split point status prediction
1              2              3         3        2.45      1          0
2              0              0         0        0.00     -1          1
3              4              5         4        1.75      1          0
4              6              7         3        4.95      1          0
5              8              9         3        4.85      1          0
6             10             11         4        1.65      1          0
7              0              0         0        0.00     -1          3
8              0              0         0        0.00     -1          3
9              0              0         0        0.00     -1          3
10             0              0         0        0.00     -1          2
11             0              0         0        0.00     -1          3
> idx <- with(iris, Petal.Length > 2.45 & Petal.Length < 3.5)
> predict(iris.rf, iris[idx, -5])
[1] versicolor versicolor versicolor
Levels: setosa versicolor virginica
> iris.rf$forest$xbestsplit[1,1] <- 3.5
> predict(iris.rf, iris[newiris, -5])
[1] setosa setosa setosa
Levels: setosa versicolor virginica

Note how the predictions have changed.

HTH,
Andy

> -----Original Message-----
> From: Martin Lam [mailto:tmlammail at yahoo.com] 
> Sent: Friday, September 09, 2005 9:04 AM
> To: Liaw, Andy; r-help at stat.math.ethz.ch
> Subject: RE: [R] Re-evaluating the tree in the random forest
> 
> 
> Hi,
> 
> Let me give a simple example, assume a dataset
> containing 5 instances  with 1 variable and the class
> label:
> 
> [x1, y]:
> [0.5, A]
> [3.2, B]
> [4.5, B]
> [1.4, C]
> [1.6, C]
> [1.9, C]
> 
> Assume that the randomForest algorithm create this (2
> levels deep) tree:
> 
> Root node: question: x1 < 2.2?
> 
> Left terminal node:
> [0.5, A]
> [1.4, C]
> [1.6, C]
> [1.9, C]
> Leaf classification: C
> 
> Right terminal node:
> [3.2, B]
> [4.5, B]
> Leaf classification: B
> 
> If I change the question at the root node to "x1 <
> 1?", the instances in the left leaf node are not
> correctly passed down the tree anymore.  
> My original question was if there was a way to
> re-evaluate the instances again into:
> 
> Root node: question: x1 < 1?
> 
> Left terminal node:
> [0.5, A]
> Leaf classification: A
> 
> Right terminal node:
> [3.2, B]
> [4.5, B]
> [1.4, C]
> [1.6, C]
> [1.9, C]
> Leaf classification: C
> 
> Cheers,
> 
> Martin
> 
> --- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> 
> > > From: Martin Lam
> > > 
> > > Dear mailinglist members,
> > > 
> > > I was wondering if there was a way to re-evaluate
> > the
> > > instances of a tree (in the forest) again after I
> > have
> > > manually changed a splitpoint (or split variable)
> > of a
> > > decision node. Here's an illustration:
> > > 
> > > library("randomForest")
> > > 
> > > forest.rf <- randomForest(formula = Species ~ .,
> > data
> > > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > > norm.votes = FALSE)
> > > 
> > > # I am going to change the splitpoint of the root
> > node
> > > of the first tree to 1
> > > forest.rf$forest$xbestsplit[1,]
> > > forest.rf$forest$xbestsplit[1,1] <- 1
> > > forest.rf$forest$xbestsplit[1,]
> > > 
> > > Because I've changed the splitpoint, some
> > instances in
> > > the leafs are not supposed where they should be.
> > Is
> > > there a way to reappoint them to the correct leaf?
> > 
> > I'm not sure what you want to do exactly, but I
> > suspect you can use
> > predict().
> >  
> > > I was also wondering how I should interpret the
> > output
> > > of do.trace:
> > > 
> > > ntree      OOB      1      2      3
> > >     1:   3.70%  0.00%  6.25%  5.88%
> > >     2:   3.49%  0.00%  3.85%  7.14%
> > >     3:   3.57%  0.00%  5.56%  5.26%
> > > 
> > > What's OOB and what does the percentages mean?
> > 
> > OOB stands for `Out-of-bag'.  Read up on random
> > forests (e.g., the article
> > in R News) to learn about it.  Those numbers are
> > estimated error rates.  The
> > `OOB' column is across all data, while the others
> > are for the classes.
> > 
> > Andy
> > 
> >  
> > > Thanks in advance,
> > > 
> > > Martin
> > > 
> > > 
> > > 	
> > > 		
> > >
> >
> ______________________________________________________
> > > Click here to donate to the Hurricane Katrina
> > relief effort.
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > > 
> > > 
> > 
> > 
> > 
> >
> --------------------------------------------------------------
> ----------------
> > Notice:  This e-mail message, together with any
> > attachments, contains information of Merck & Co.,
> > Inc. (One Merck Drive, Whitehouse Station, New
> > Jersey, USA 08889), and/or its affiliates (which may
> > be known outside the United States as Merck Frosst,
> > Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> > that may be confidential, proprietary copyrighted
> > and/or legally privileged. It is intended solely for
> > the use of the individual or entity named on this
> > message.  If you are not the intended recipient, and
> > have received this message in error, please notify
> > us immediately by reply e-mail and then delete it
> > from your system.
> >
> --------------------------------------------------------------
> ----------------
> > 
> 
> 
> 
> 	
> 		
> ______________________________________________________
> Click here to donate to the Hurricane Katrina relief effort.
> http://store.yahoo.com/redcross-donate3/
> 
> 
>