[R] Re-evaluating the tree in the random forest
Martin Lam
tmlammail at yahoo.com
Fri Sep 9 15:04:07 CEST 2005
Hi,
Let me give a simple example, assume a dataset
containing 5 instances with 1 variable and the class
label:
[x1, y]:
[0.5, A]
[3.2, B]
[4.5, B]
[1.4, C]
[1.6, C]
[1.9, C]
Assume that the randomForest algorithm create this (2
levels deep) tree:
Root node: question: x1 < 2.2?
Left terminal node:
[0.5, A]
[1.4, C]
[1.6, C]
[1.9, C]
Leaf classification: C
Right terminal node:
[3.2, B]
[4.5, B]
Leaf classification: B
If I change the question at the root node to "x1 <
1?", the instances in the left leaf node are not
correctly passed down the tree anymore.
My original question was if there was a way to
re-evaluate the instances again into:
Root node: question: x1 < 1?
Left terminal node:
[0.5, A]
Leaf classification: A
Right terminal node:
[3.2, B]
[4.5, B]
[1.4, C]
[1.6, C]
[1.9, C]
Leaf classification: C
Cheers,
Martin
--- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> > From: Martin Lam
> >
> > Dear mailinglist members,
> >
> > I was wondering if there was a way to re-evaluate
> the
> > instances of a tree (in the forest) again after I
> have
> > manually changed a splitpoint (or split variable)
> of a
> > decision node. Here's an illustration:
> >
> > library("randomForest")
> >
> > forest.rf <- randomForest(formula = Species ~ .,
> data
> > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > norm.votes = FALSE)
> >
> > # I am going to change the splitpoint of the root
> node
> > of the first tree to 1
> > forest.rf$forest$xbestsplit[1,]
> > forest.rf$forest$xbestsplit[1,1] <- 1
> > forest.rf$forest$xbestsplit[1,]
> >
> > Because I've changed the splitpoint, some
> instances in
> > the leafs are not supposed where they should be.
> Is
> > there a way to reappoint them to the correct leaf?
>
> I'm not sure what you want to do exactly, but I
> suspect you can use
> predict().
>
> > I was also wondering how I should interpret the
> output
> > of do.trace:
> >
> > ntree OOB 1 2 3
> > 1: 3.70% 0.00% 6.25% 5.88%
> > 2: 3.49% 0.00% 3.85% 7.14%
> > 3: 3.57% 0.00% 5.56% 5.26%
> >
> > What's OOB and what does the percentages mean?
>
> OOB stands for `Out-of-bag'. Read up on random
> forests (e.g., the article
> in R News) to learn about it. Those numbers are
> estimated error rates. The
> `OOB' column is across all data, while the others
> are for the classes.
>
> Andy
>
>
> > Thanks in advance,
> >
> > Martin
> >
> >
> >
> >
> >
>
______________________________________________________
> > Click here to donate to the Hurricane Katrina
> relief effort.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
>
>
>
>
------------------------------------------------------------------------------
> Notice: This e-mail message, together with any
> attachments, contains information of Merck & Co.,
> Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the United States as Merck Frosst,
> Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for
> the use of the individual or entity named on this
> message. If you are not the intended recipient, and
> have received this message in error, please notify
> us immediately by reply e-mail and then delete it
> from your system.
>
------------------------------------------------------------------------------
>
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
More information about the R-help
mailing list