kan Liu
Mon Jun 7 11:16:40 CEST 2004
Hi,
We got a question about interpretating R-suqared.
The actual outputs for a test dataset is X=(x1,x2,
..., xn).
model 1 predicted the outputs as Y1=(y11,y12,..., y1n)
model n predicted the outputs as Y2=(y21,y22,..., y2n)
...
model m predicted the outputs as Ym=(ym1,ym2,..., ymn)
Now we have two ways to calculate R squared to
evaluate the average performance of committee model.
(a) Calculate R squared between (X, Y1), (X, Y2), ...,
(X,Ym), and then averaging the R squared
(b) Calculate average Y=(Y1+Y2, + ... Ym)/m, and then
calculate the R squared between (X, Y).
We found it seemed that R squared calculated in (b) is
'always' higher than that in (a).
Does this result depends on the test dataset or this
happened by chance?Can you advise me any reference for
this issue?
Many thanks in advance!
Kan
