[R] Random Forests: Question about R^2

Liaw, Andy andy_liaw at merck.com
Tue Apr 21 14:05:50 CEST 2009


Just one small correction:  in #3 it should be squared residuals.

Yes, the function returns a vector of r^2 with length=ntree, with the k-th element being the r^2 for the forest consisting of the first k trees. 

Cheers,
Andy

From: Dimitri Liakhovitski 
> 
> I would like to summarize. Would you please confirm that my summary is
> correct? Thank you very much!
> 
> Determining R^2 in Random Forests (for a Regression Forest):
> 
> 1. For each individual case, record a mean prediction on the dependent
> variable y across all trees for which the case is OOB (Out-of-Bag);
> 2. For each individual case, calculate a residual: residual = observed
> y - mean predicted y (from step 1)
> 3. Calculate mean square residual MSE: MSE = sum of all individual
> residuals (from step 2) / n
> 4. Because MSE/var(y) represents the proportion of y variance that is
> due to error, then R^2 = 1 - MSE/var(y).
> 
> If it's correct, my last question would be:
> I am getting as many R^2 as the number of trees because each time the
> residuals are recalculated using all trees built so far, correct?
> 
> Thank you very much!
> Dimitri
> 
> 
> On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy 
> <andy_liaw at merck.com> wrote:
> > Apologies: that should have been sum(residual^2)!
> >
> >> -----Original Message-----
> >> From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com]
> >> Sent: Monday, April 13, 2009 4:35 PM
> >> To: Liaw, Andy
> >> Cc: R-Help List
> >> Subject: Re: [R] Random Forests: Question about R^2
> >>
> >> Andy,
> >> thank you very much!
> >> One clarification question:
> >>
> >> If MSE = sum(residuals) / n, then
> >> in the formula (1 - mse / Var(y)) - shouldn't one square mse before
> >> dividing by variance?
> >>
> >> Dimitri
> >>
> >>
> >> On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy
> >> <andy_liaw at merck.com> wrote:
> >> > MSE is the mean squared residuals.  For the training 
> data, the OOB
> >> > estimate is used (i.e., residual = data - OOB prediction, MSE =
> >> > sum(residuals) / n, OOB prediction is the mean of
> >> predictions from all
> >> > trees for which the case is OOB).  It is _not_ the average
> >> OOB MSE of
> >> > trees in the forest.
> >> >
> >> > I hope there's no question about how the pseudo R^2 is 
> computed on a
> >> > test set?  If you understand how that's done, I assume the
> >> confusion is
> >> > only how the OOB MSE is formed.
> >> >
> >> > Best,
> >> > Andy
> >> >
> >> > From: Dimitri Liakhovitski
> >> >>
> >> >> Dear Random Forests gurus,
> >> >>
> >> >> I have a question about R^2 provided by randomForest (for
> >> regression).
> >> >> I don't succeed in finding this information.
> >> >>
> >> >> In the help file for randomForest under "Value" it says:
> >> >>
> >> >> rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y).
> >> >>
> >> >> Could someone please explain in somewhat more detail how
> >> exactly R^2
> >> >> is calculated?
> >> >> Is "mse" mean squared error for prediction?
> >> >> Is "mse" an average of mse's for all trees run on out-of-bag
> >> >> holdout samples?
> >> >> In other words - is this R^2 based on out-of-bag samples?
> >> >>
> >> >> Thank you very much for clarification!
> >> >>
> >> >> --
> >> >> Dimitri Liakhovitski
> >> >> MarketTools, Inc.
> >> >> Dimitri.Liakhovitski at markettools.com
> >> >>
> >> >> ______________________________________________
> >> >> R-help at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, 
> reproducible code.
> >> >>
> >> > Notice:  This e-mail message, together with any
> >> attachments, contains
> >> > information of Merck & Co., Inc. (One Merck Drive,
> >> Whitehouse Station,
> >> > New Jersey, USA 08889), and/or its affiliates (which may be known
> >> > outside the United States as Merck Frosst, Merck Sharp & Dohme or
> >> > MSD and in Japan, as Banyu - direct contact information for
> >> affiliates is
> >> > available at http://www.merck.com/contact/contacts.html) 
> that may be
> >> > confidential, proprietary copyrighted and/or legally
> >> privileged. It is
> >> > intended solely for the use of the individual or entity
> >> named on this
> >> > message. If you are not the intended recipient, and have
> >> received this
> >> > message in error, please notify us immediately by reply 
> e-mail and
> >> > then delete it from your system.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Dimitri Liakhovitski
> >> MarketTools, Inc.
> >> Dimitri.Liakhovitski at markettools.com
> >>
> > Notice:  This e-mail message, together with any 
> attachments, contains
> > information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station,
> > New Jersey, USA 08889), and/or its affiliates (which may be known
> > outside the United States as Merck Frosst, Merck Sharp & Dohme or
> > MSD and in Japan, as Banyu - direct contact information for 
> affiliates is
> > available at http://www.merck.com/contact/contacts.html) that may be
> > confidential, proprietary copyrighted and/or legally 
> privileged. It is
> > intended solely for the use of the individual or entity 
> named on this
> > message. If you are not the intended recipient, and have 
> received this
> > message in error, please notify us immediately by reply e-mail and
> > then delete it from your system.
> >
> >
> 
> 
> 
> -- 
> Dimitri Liakhovitski
> MarketTools, Inc.
> Dimitri.Liakhovitski at markettools.com
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list