[R] Re-evaluating the tree in the random forest

Martin Lam tmlammail at yahoo.com
Sat Sep 10 12:04:40 CEST 2005


Hi Andy,

Thank you for your help but it was not really a
solution to my problem. Following from your iris
example, if I do "iris.rf$forest$xbestsplit[1,1] <-
1.1" instead of "iris.rf$forest$xbestsplit[1,1] <-
3.5" then the training instances in node 2 (left node
of the root node) aren't correctly split any more,
since there are training instances that have
Petal.Length > 1.1. 

So, I wondered if it was possible that after I've made
a change in a splitpoint that the tree only put
training instances with Petal.Length < 1.1 in node 2
and the others in node 3, from node 3 on the training
instances from node 2 with Petal.Length >= 1.1 are
passed down the tree until they reach the leafs and
finally the classification in the leafs are updated.

Thanks in advance,

Martin


--- "Liaw, Andy" <andy_liaw at merck.com> wrote:

> Here's an example, using the iris data:
> 
> > ## Grow one tree, using all data, and try all
> variables at all splits,
> > ## using large nodesize to get smaller tree.
> > iris.rf <- randomForest(iris[-5], iris[[5]],
> ntree=1, nodesize=20, mtry=4,
> +                         sampsize=150,
> replace=FALSE)
> > getTree(iris.rf, 1)
>    left daughter right daughter split var split
> point status prediction
> 1              2              3         3       
> 2.45      1          0
> 2              0              0         0       
> 0.00     -1          1
> 3              4              5         4       
> 1.75      1          0
> 4              6              7         3       
> 4.95      1          0
> 5              8              9         3       
> 4.85      1          0
> 6             10             11         4       
> 1.65      1          0
> 7              0              0         0       
> 0.00     -1          3
> 8              0              0         0       
> 0.00     -1          3
> 9              0              0         0       
> 0.00     -1          3
> 10             0              0         0       
> 0.00     -1          2
> 11             0              0         0       
> 0.00     -1          3
> > idx <- with(iris, Petal.Length > 2.45 &
> Petal.Length < 3.5)
> > predict(iris.rf, iris[idx, -5])
> [1] versicolor versicolor versicolor
> Levels: setosa versicolor virginica
> > iris.rf$forest$xbestsplit[1,1] <- 3.5
> > predict(iris.rf, iris[newiris, -5])
> [1] setosa setosa setosa
> Levels: setosa versicolor virginica
> 
> Note how the predictions have changed.
> 
> HTH,
> Andy
> 
> > -----Original Message-----
> > From: Martin Lam [mailto:tmlammail at yahoo.com] 
> > Sent: Friday, September 09, 2005 9:04 AM
> > To: Liaw, Andy; r-help at stat.math.ethz.ch
> > Subject: RE: [R] Re-evaluating the tree in the
> random forest
> > 
> > 
> > Hi,
> > 
> > Let me give a simple example, assume a dataset
> > containing 5 instances  with 1 variable and the
> class
> > label:
> > 
> > [x1, y]:
> > [0.5, A]
> > [3.2, B]
> > [4.5, B]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> > 
> > Assume that the randomForest algorithm create this
> (2
> > levels deep) tree:
> > 
> > Root node: question: x1 < 2.2?
> > 
> > Left terminal node:
> > [0.5, A]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> > Leaf classification: C
> > 
> > Right terminal node:
> > [3.2, B]
> > [4.5, B]
> > Leaf classification: B
> > 
> > If I change the question at the root node to "x1 <
> > 1?", the instances in the left leaf node are not
> > correctly passed down the tree anymore.  
> > My original question was if there was a way to
> > re-evaluate the instances again into:
> > 
> > Root node: question: x1 < 1?
> > 
> > Left terminal node:
> > [0.5, A]
> > Leaf classification: A
> > 
> > Right terminal node:
> > [3.2, B]
> > [4.5, B]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> > Leaf classification: C
> > 
> > Cheers,
> > 
> > Martin
> > 
> > --- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> > 
> > > > From: Martin Lam
> > > > 
> > > > Dear mailinglist members,
> > > > 
> > > > I was wondering if there was a way to
> re-evaluate
> > > the
> > > > instances of a tree (in the forest) again
> after I
> > > have
> > > > manually changed a splitpoint (or split
> variable)
> > > of a
> > > > decision node. Here's an illustration:
> > > > 
> > > > library("randomForest")
> > > > 
> > > > forest.rf <- randomForest(formula = Species ~
> .,
> > > data
> > > > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > > > norm.votes = FALSE)
> > > > 
> > > > # I am going to change the splitpoint of the
> root
> > > node
> > > > of the first tree to 1
> > > > forest.rf$forest$xbestsplit[1,]
> > > > forest.rf$forest$xbestsplit[1,1] <- 1
> > > > forest.rf$forest$xbestsplit[1,]
> > > > 
> > > > Because I've changed the splitpoint, some
> > > instances in
> > > > the leafs are not supposed where they should
> be.
> > > Is
> > > > there a way to reappoint them to the correct
> leaf?
> > > 
> > > I'm not sure what you want to do exactly, but I
> > > suspect you can use
> > > predict().
> > >  
> > > > I was also wondering how I should interpret
> the
> > > output
> > > > of do.trace:
> > > > 
> > > > ntree      OOB      1      2      3
> > > >     1:   3.70%  0.00%  6.25%  5.88%
> > > >     2:   3.49%  0.00%  3.85%  7.14%
> > > >     3:   3.57%  0.00%  5.56%  5.26%
> > > > 
> > > > What's OOB and what does the percentages mean?
> > > 
> > > OOB stands for `Out-of-bag'.  Read up on random
> > > forests (e.g., the article
> > > in R News) to learn about it.  Those numbers are
> > > estimated error rates.  The
> > > `OOB' column is across all data, while the
> others
> > > are for the classes.
> > > 
> > > Andy
> > > 
> > >  
> > > > Thanks in advance,
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > 	
> > > > 		
> > > >
> > >
> >
>
______________________________________________________
> > > > Click here to donate to the Hurricane Katrina
> > > relief effort.
> > > > 
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide! 
> > > > http://www.R-project.org/posting-guide.html
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > >
> >
>
--------------------------------------------------------------
> > ----------------
> > > Notice:  This e-mail message, together with any
> > > attachments, contains information of Merck &
> Co.,
> > > Inc. (One Merck Drive, Whitehouse Station, New
> > > Jersey, USA 08889), and/or its affiliates (which
> may
> > > be known outside the United States as Merck
> Frosst,
> > > Merck Sharp & Dohme or MSD and in Japan, as
> Banyu)
> > > that may be confidential, proprietary
> copyrighted
> > > and/or legally privileged. It is intended solely
> for
> > > the use of the individual or entity named on
> this
> > > message.  If you are not the intended recipient,
> and
> > > have received this message in error, please
> notify
> > > us immediately by reply e-mail and then delete
> it
> > > from your system.
> > >
> >
>
--------------------------------------------------------------
> > ----------------
> > > 
> > 
> > 
> > 
> > 	
> > 		
> >
>
______________________________________________________
> > Click here to donate to the Hurricane Katrina
> relief effort.
> > http://store.yahoo.com/redcross-donate3/
> > 
> > 
> > 
> 
> 
> 
>
------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any
> attachments, contains information of Merck & Co.,
> Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the United States as Merck Frosst,
> Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for
> the use of the individual or entity named on this
> message.  If you are not the intended recipient, and
> have received this message in error, please notify
> us immediately by reply e-mail and then delete it
> from your system.
>
------------------------------------------------------------------------------
> 



	
		
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.




More information about the R-help mailing list