The Y1, Y2, etc. that Kan mentioned are predicted values of a test set data
from models that supposedly were fitted to the same (or similar) data. It's
hard for me to imagine the outcome would be as `severe' as Y1 = -Y2.
That said, I do not think that the R-squared (or q-squared as some call it)
of the aggregate model is necessarily larger or equal to the average
R-squared of the component models. It obviously depends on how the
component models are generated. As a hypothetical example (because I
haven't acutally tried it, just speculating): Suppose the data are
generated from a step function, the sort that would be perfect for
regression trees. If one grows several well-pruned trees, I'd guess that
the average R-squared of the individual trees has a chance of being larger
than the R-squared of the averaged model.
> From: Gabor Grothendieck
>
> Suppose m=2, Y1=Y and Y2= -Y. Then (b) is zero so (a) must be
> greater or equal to (b). Thus (b) is not necessarily greater
> than (a).
>
>
> kan Liu <kan_liu1 <at> yahoo.com> writes:
>
> :
> : Hi,
> :
> : We got a question about interpretating R-suqared.
> :
> : The actual outputs for a test dataset is X=(x1,x2, ..., xn).
> : model 1 predicted the outputs as Y1=(y11,y12,..., y1n)
> : model n predicted the outputs as Y2=(y21,y22,..., y2n)
> :
> : ...
> : model m predicted the outputs as Ym=(ym1,ym2,..., ymn)
> :
> : Now we have two ways to calculate R squared to evaluate the average
> performance of committee model.
> :
> : (a) Calculate R squared between (X, Y1), (X, Y2), ...,
> (X,Ym), and then
> averaging the R squared
> : (b) Calculate average Y=(Y1+Y2, + ... Ym)/m, and then
> calculate the R
> squared between (X, Y).
> :
> : We found it seemed that R squared calculated in (b) is
> 'always' higher than
> that in (a).
> :
> : Does this result depends on the test dataset or this
> happened by chance?Can
> you advise me any reference for
> : this issue?
> :
> : Many thanks in advance!
> :
> : Kan
> :
> :
> :
