[R] Re-evaluating the tree in the random forest

Fri Sep 9 15:04:07 CEST 2005

Hi,

Let me give a simple example, assume a dataset
containing 5 instances  with 1 variable and the class
label:

[x1, y]:
[0.5, A]
[3.2, B]
[4.5, B]
[1.4, C]
[1.6, C]
[1.9, C]

Assume that the randomForest algorithm create this (2
levels deep) tree:

Root node: question: x1 < 2.2?

Left terminal node:
[0.5, A]
[1.4, C]
[1.6, C]
[1.9, C]
Leaf classification: C

Right terminal node:
[3.2, B]
[4.5, B]
Leaf classification: B

If I change the question at the root node to "x1 <
1?", the instances in the left leaf node are not
correctly passed down the tree anymore.  
My original question was if there was a way to
re-evaluate the instances again into:

Root node: question: x1 < 1?

Left terminal node:
[0.5, A]
Leaf classification: A

Right terminal node:
[3.2, B]
[4.5, B]
[1.4, C]
[1.6, C]
[1.9, C]
Leaf classification: C

Cheers,

Martin

--- "Liaw, Andy" <andy_liaw at merck.com> wrote:

> > From: Martin Lam
> > 
> > Dear mailinglist members,
> > 
> > I was wondering if there was a way to re-evaluate
> the
> > instances of a tree (in the forest) again after I
> have
> > manually changed a splitpoint (or split variable)
> of a
> > decision node. Here's an illustration:
> > 
> > library("randomForest")
> > 
> > forest.rf <- randomForest(formula = Species ~ .,
> data
> > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > norm.votes = FALSE)
> > 
> > # I am going to change the splitpoint of the root
> node
> > of the first tree to 1
> > forest.rf$forest$xbestsplit[1,]
> > forest.rf$forest$xbestsplit[1,1] <- 1
> > forest.rf$forest$xbestsplit[1,]
> > 
> > Because I've changed the splitpoint, some
> instances in
> > the leafs are not supposed where they should be.
> Is
> > there a way to reappoint them to the correct leaf?
> 
> I'm not sure what you want to do exactly, but I
> suspect you can use
> predict().
>  
> > I was also wondering how I should interpret the
> output
> > of do.trace:
> > 
> > ntree      OOB      1      2      3
> >     1:   3.70%  0.00%  6.25%  5.88%
> >     2:   3.49%  0.00%  3.85%  7.14%
> >     3:   3.57%  0.00%  5.56%  5.26%
> > 
> > What's OOB and what does the percentages mean?
> 
> OOB stands for `Out-of-bag'.  Read up on random
> forests (e.g., the article
> in R News) to learn about it.  Those numbers are
> estimated error rates.  The
> `OOB' column is across all data, while the others
> are for the classes.
> 
> Andy
> 
>  
> > Thanks in advance,
> > 
> > Martin
> > 
> > 
> > 	
> > 		
> >
>
______________________________________________________
> > Click here to donate to the Hurricane Katrina
> relief effort.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> > 
> 
> 
> 
>
------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any
> attachments, contains information of Merck & Co.,
> Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the United States as Merck Frosst,
> Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for
> the use of the individual or entity named on this
> message.  If you are not the intended recipient, and
> have received this message in error, please notify
> us immediately by reply e-mail and then delete it
> from your system.
>
------------------------------------------------------------------------------
> 

______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.