[R] Re-evaluating the tree in the random forest
Martin Lam
tmlammail at yahoo.com
Sat Sep 10 12:04:40 CEST 2005
Hi Andy,
Thank you for your help but it was not really a
solution to my problem. Following from your iris
example, if I do "iris.rf$forest$xbestsplit[1,1] <-
1.1" instead of "iris.rf$forest$xbestsplit[1,1] <-
3.5" then the training instances in node 2 (left node
of the root node) aren't correctly split any more,
since there are training instances that have
Petal.Length > 1.1.
So, I wondered if it was possible that after I've made
a change in a splitpoint that the tree only put
training instances with Petal.Length < 1.1 in node 2
and the others in node 3, from node 3 on the training
instances from node 2 with Petal.Length >= 1.1 are
passed down the tree until they reach the leafs and
finally the classification in the leafs are updated.
Thanks in advance,
Martin
--- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> Here's an example, using the iris data:
>
> > ## Grow one tree, using all data, and try all
> variables at all splits,
> > ## using large nodesize to get smaller tree.
> > iris.rf <- randomForest(iris[-5], iris[[5]],
> ntree=1, nodesize=20, mtry=4,
> + sampsize=150,
> replace=FALSE)
> > getTree(iris.rf, 1)
> left daughter right daughter split var split
> point status prediction
> 1 2 3 3
> 2.45 1 0
> 2 0 0 0
> 0.00 -1 1
> 3 4 5 4
> 1.75 1 0
> 4 6 7 3
> 4.95 1 0
> 5 8 9 3
> 4.85 1 0
> 6 10 11 4
> 1.65 1 0
> 7 0 0 0
> 0.00 -1 3
> 8 0 0 0
> 0.00 -1 3
> 9 0 0 0
> 0.00 -1 3
> 10 0 0 0
> 0.00 -1 2
> 11 0 0 0
> 0.00 -1 3
> > idx <- with(iris, Petal.Length > 2.45 &
> Petal.Length < 3.5)
> > predict(iris.rf, iris[idx, -5])
> [1] versicolor versicolor versicolor
> Levels: setosa versicolor virginica
> > iris.rf$forest$xbestsplit[1,1] <- 3.5
> > predict(iris.rf, iris[newiris, -5])
> [1] setosa setosa setosa
> Levels: setosa versicolor virginica
>
> Note how the predictions have changed.
>
> HTH,
> Andy
>
> > -----Original Message-----
> > From: Martin Lam [mailto:tmlammail at yahoo.com]
> > Sent: Friday, September 09, 2005 9:04 AM
> > To: Liaw, Andy; r-help at stat.math.ethz.ch
> > Subject: RE: [R] Re-evaluating the tree in the
> random forest
> >
> >
> > Hi,
> >
> > Let me give a simple example, assume a dataset
> > containing 5 instances with 1 variable and the
> class
> > label:
> >
> > [x1, y]:
> > [0.5, A]
> > [3.2, B]
> > [4.5, B]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> >
> > Assume that the randomForest algorithm create this
> (2
> > levels deep) tree:
> >
> > Root node: question: x1 < 2.2?
> >
> > Left terminal node:
> > [0.5, A]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> > Leaf classification: C
> >
> > Right terminal node:
> > [3.2, B]
> > [4.5, B]
> > Leaf classification: B
> >
> > If I change the question at the root node to "x1 <
> > 1?", the instances in the left leaf node are not
> > correctly passed down the tree anymore.
> > My original question was if there was a way to
> > re-evaluate the instances again into:
> >
> > Root node: question: x1 < 1?
> >
> > Left terminal node:
> > [0.5, A]
> > Leaf classification: A
> >
> > Right terminal node:
> > [3.2, B]
> > [4.5, B]
> > [1.4, C]
> > [1.6, C]
> > [1.9, C]
> > Leaf classification: C
> >
> > Cheers,
> >
> > Martin
> >
> > --- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> >
> > > > From: Martin Lam
> > > >
> > > > Dear mailinglist members,
> > > >
> > > > I was wondering if there was a way to
> re-evaluate
> > > the
> > > > instances of a tree (in the forest) again
> after I
> > > have
> > > > manually changed a splitpoint (or split
> variable)
> > > of a
> > > > decision node. Here's an illustration:
> > > >
> > > > library("randomForest")
> > > >
> > > > forest.rf <- randomForest(formula = Species ~
> .,
> > > data
> > > > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > > > norm.votes = FALSE)
> > > >
> > > > # I am going to change the splitpoint of the
> root
> > > node
> > > > of the first tree to 1
> > > > forest.rf$forest$xbestsplit[1,]
> > > > forest.rf$forest$xbestsplit[1,1] <- 1
> > > > forest.rf$forest$xbestsplit[1,]
> > > >
> > > > Because I've changed the splitpoint, some
> > > instances in
> > > > the leafs are not supposed where they should
> be.
> > > Is
> > > > there a way to reappoint them to the correct
> leaf?
> > >
> > > I'm not sure what you want to do exactly, but I
> > > suspect you can use
> > > predict().
> > >
> > > > I was also wondering how I should interpret
> the
> > > output
> > > > of do.trace:
> > > >
> > > > ntree OOB 1 2 3
> > > > 1: 3.70% 0.00% 6.25% 5.88%
> > > > 2: 3.49% 0.00% 3.85% 7.14%
> > > > 3: 3.57% 0.00% 5.56% 5.26%
> > > >
> > > > What's OOB and what does the percentages mean?
> > >
> > > OOB stands for `Out-of-bag'. Read up on random
> > > forests (e.g., the article
> > > in R News) to learn about it. Those numbers are
> > > estimated error rates. The
> > > `OOB' column is across all data, while the
> others
> > > are for the classes.
> > >
> > > Andy
> > >
> > >
> > > > Thanks in advance,
> > > >
> > > > Martin
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
______________________________________________________
> > > > Click here to donate to the Hurricane Katrina
> > > relief effort.
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
>
--------------------------------------------------------------
> > ----------------
> > > Notice: This e-mail message, together with any
> > > attachments, contains information of Merck &
> Co.,
> > > Inc. (One Merck Drive, Whitehouse Station, New
> > > Jersey, USA 08889), and/or its affiliates (which
> may
> > > be known outside the United States as Merck
> Frosst,
> > > Merck Sharp & Dohme or MSD and in Japan, as
> Banyu)
> > > that may be confidential, proprietary
> copyrighted
> > > and/or legally privileged. It is intended solely
> for
> > > the use of the individual or entity named on
> this
> > > message. If you are not the intended recipient,
> and
> > > have received this message in error, please
> notify
> > > us immediately by reply e-mail and then delete
> it
> > > from your system.
> > >
> >
>
--------------------------------------------------------------
> > ----------------
> > >
> >
> >
> >
> >
> >
> >
>
______________________________________________________
> > Click here to donate to the Hurricane Katrina
> relief effort.
> > http://store.yahoo.com/redcross-donate3/
> >
> >
> >
>
>
>
>
------------------------------------------------------------------------------
> Notice: This e-mail message, together with any
> attachments, contains information of Merck & Co.,
> Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the United States as Merck Frosst,
> Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for
> the use of the individual or entity named on this
> message. If you are not the intended recipient, and
> have received this message in error, please notify
> us immediately by reply e-mail and then delete it
> from your system.
>
------------------------------------------------------------------------------
>
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
More information about the R-help
mailing list