[R] question about correlation coefficeint and root mean square (with code used)

Wed Aug 2 19:12:25 CEST 2006

Dear all,

I am using different multiple regression models (OLS and principal
component regression (PCR)) to make prediction of my test set. And those
models come from the same training set, except that the number of
variables or descriptors (columns of X) used in OLS is less than those
used in PCR.

And I use square correlation coefficient (r^2) and root mean square to see
the relationship between my prediction and the experimental measurements
of the test set. Here is the problem:

My r^2 from PCR prediction is higher than r^2 from OLS prediction (0.8 vs.
0.7). However, my RMS of PCR prediction is also higher than OLS (0.55 vs.
0.48). I would expect r^2 and RMS show consistant trend (r^2 increase &
rms decrease, or the opposite). But why am I getting opposite results? Is
it because PCR is a biased method? Which one (r^2 or RMS) should be more
reliable to evaluate the model?

Here is the simple code I used for calculating r^2 and RMS in R (test set
size is 40):

r2=cor(test$p50, test.pred$fit)*cor(test$p50, test.pred$fit)

rms=sqrt((test.pred$fit-test$p50)%*%(test.pred$fit-test$p50)/40)

Really appreciate your kind help!

Sincerely,
Jeny