kan Liu
kan_liu1 at yahoo.com
Tue Jun 8 13:41:48 CEST 2004
Hi,
Thanks for your message. I tried to prove that the
R-squared of the averaged model is always greater than
or equals to the average R-squared of individual
models (supposed m=2), Please see the attached r2.pdf.
I hope this can be generalized to general case (m >
2).
Any comment would be very appreciated!
Kan
Cambridge University, UK
--- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> The Y1, Y2, etc. that Kan mentioned are predicted
> values of a test set data
> from models that supposedly were fitted to the same
> (or similar) data. It's
> hard for me to imagine the outcome would be as
> `severe' as Y1 = -Y2.
>
> That said, I do not think that the R-squared (or
> q-squared as some call it)
> of the aggregate model is necessarily larger or
> equal to the average
> R-squared of the component models. It obviously
> depends on how the
> component models are generated. As a hypothetical
> example (because I
> haven't acutally tried it, just speculating):
> Suppose the data are
> generated from a step function, the sort that would
> be perfect for
> regression trees. If one grows several well-pruned
> trees, I'd guess that
> the average R-squared of the individual trees has a
> chance of being larger
> than the R-squared of the averaged model.
>
> Best,
> Andy
>
> > From: Gabor Grothendieck
> >
> > Suppose m=2, Y1=Y and Y2= -Y. Then (b) is zero so
> (a) must be
> > greater or equal to (b). Thus (b) is not
> necessarily greater
> > than (a).
> >
> >
> > kan Liu <kan_liu1 <at> yahoo.com> writes:
> >
> > :
> > : Hi,
> > :
> > : We got a question about interpretating
> R-suqared.
> > :
> > : The actual outputs for a test dataset is
> X=(x1,x2, ..., xn).
> > : model 1 predicted the outputs as
> Y1=(y11,y12,..., y1n)
> > : model n predicted the outputs as
> Y2=(y21,y22,..., y2n)
> > :
> > : ...
> > : model m predicted the outputs as
> Ym=(ym1,ym2,..., ymn)
> > :
> > : Now we have two ways to calculate R squared to
> evaluate the average
> > performance of committee model.
> > :
> > : (a) Calculate R squared between (X, Y1), (X,
> Y2), ...,
> > (X,Ym), and then
> > averaging the R squared
> > : (b) Calculate average Y=(Y1+Y2, + ... Ym)/m, and
> then
> > calculate the R
> > squared between (X, Y).
> > :
> > : We found it seemed that R squared calculated in
> (b) is
> > 'always' higher than
> > that in (a).
> > :
> > : Does this result depends on the test dataset or
> this
> > happened by chance?Can
> > you advise me any reference for
> > : this issue?
> > :
> > : Many thanks in advance!
> > :
> > : Kan
> > :
> > :
> > :
> > : ---------------------------------
> > :
> > : [[alternative HTML version deleted]]
> > :
> >
> >
>
>
>
