[R] R2 function from PLS to use a model on test data

Tue Aug 3 10:25:09 CEST 2010

Addi Wei <addiwei at gmail.com> writes:

> Hello,  
>    I am having some trouble using a model I created from plsr (of train) to
> analyze each invididual R^2 of the 10 components against the test data.  For
> example:
>
> mice1 <- plsr(response ~factors, ncomp=10 data=MiceTrain)
> R2(mice1)    ##this provides the correct R2 for the Train data for 10
> components
> ## Now my next objective is to calculate my model's R2 for each component on
> the Test data. (In other words - test how good the model is on test data) 
> The only thing I need are the MiceTest.response, and compare that with
> predict(mice1, ncomp=1, newdata=MiceTest , and I should be able to calculate
> R2.....but I can't figure out the correct command to do this.   I tried the
> command below, which does provide a different R2 response, however, I'm not
> sure it is correct as I get a different R^2 value from another software MOE
> ( Molecular Operating Environment ).
>
> R2(mice1, estimate="test", MiceTest)
>
> Is the above the correct code to achieve what I'm doing?  If so, then MOE
> probably uses a different function to calculate the model component's R^2
> for Test data.

That is the way to get test set "R^2" for PLSR/PCR models, yes.

If you read in the documentation of R2, you will find:

     The R^2 values returned by '"R2"' are calculated as 1 - SSE/SST,
     where SST is the (corrected) total sum of squares of the response,
     and SSE is the sum of squared errors for either the fitted values
     (i.e., the residual sum of squares), test set predictions or
     cross-validated predictions (i.e., the PRESS).

This is, AFAIK, the most common way to define R^2.  For training data,
this is equivalent to cor(y, yhat)^2, but not for test data or
cross-validation.

>From your second email, I would guess that MOE uses cor(y, yhat)^2 instead
of 1 - SSE/SST.

-- 
Bjørn-Helge Mevik