[R-sig-Geo] negative r-squares

Jason Gasper Jason.Gasper at noaa.gov
Thu Sep 9 23:06:30 CEST 2010


I may be a little off base here, but wouldn't an ex-sample R^2 
calculation be required since you are using your test data as a 
"prediction". So the  ex-sample R^2 would be 1-(SSE test 
data/sum(Y-mean(Ytrain))^2).  So R^2 in this context has been motivated 
as a comparison between two competing models.  Thus, a negative R^2 
value would indicate your  ex-sample (test data) forecasts are worse 
than a mean value.

caspar hallmann wrote:
> This raises a question though whether one should use the mean of the
> training data or the mean of the test data in calculating the total
> sum of squares. I believe the first is more fair with respect to
> answering whether a given model is any better as compared the null
> model in predicting the response. When using sst based the mean of the
> test data you are essentially comparing your model to a null model
> that has been based on different data (which i think isn't fair), and
> its probably the reason why the ss.err > sst,  and hence R2<0.
>
> Caspar
>
>
>
> On Thu, Sep 9, 2010 at 9:15 PM, Edzer Pebesma
> <edzer.pebesma at uni-muenster.de> wrote:
>   
>> Pinar, Jason,
>>
>> From the script below it seems no adjustment for degrees of freedom
>> is being made.
>>
>> In this case R2 can become negative because you use a different
>> test and train set. Suppose the test set contains one single
>> extreme that is not present in the training set. In that case, the
>> mean of the test values is, in terms of sum of squares, a better
>> predicter than your regression model that didn't know about this
>> outlier. Don't forget that the mean of the test set does contain
>> this outlier. Hence, R2 can easily become negative when evaluated
>> over a different data set then the regression model was derived from.
>>
>> On 09/09/2010 06:25 PM, Jason Gasper wrote:
>>     
>>> Hello Pinar,
>>>
>>> I don't know for sure what your calculation is, but R2 values can range
>>> from -inf to 1 if an adjusted R2 is being used. In other words, one
>>> possibility is that your adjusting for degrees of freedom using some
>>> variation of the following (n-1/n-k)(1-R2) where the adjusted R2 is
>>> equivalent to simple regression when k=1.  So when the estimated R2 less
>>> than or equal to 0 that means the model forecast is inferior to the mean
>>> (really poor fit). Another way of looking at a negative R2 is that the
>>> fit is worse than a horizontal line, so the sum-of-squares from the
>>> model is larger than the sum-of-squares from a horizontal line. Again,
>>> poor fit.
>>>
>>> Cheers-Jason
>>>
>>>
>>> Pinar Aslantas Bostan wrote:
>>>       
>>>> Hi all,
>>>>
>>>> I am working about comparison of kriging and regression methods. I
>>>> have one dependent (PREC) and seven independent variables. I created
>>>> 10 different test and train datasets. I am using train datasets for
>>>> building the models and test datasets for calculating error (RMSE) and
>>>> r-squares. When I obtained prediction values for grid, then I use
>>>> overlay() to get predictions for test dataset. For example:
>>>>
>>>> # regression kriging
>>>> # dem is the grid (I want to get predictions for each pixel of dem)
>>>> and dem$rk.pred1 contains regression kriging predictions
>>>>         
>>>>> test1$rk.predicted = dem$rk.pred1[overlay(dem, test1)]
>>>>>           
>>>> # calculating r-square values based on test values
>>>>         
>>>>> ss <-(test1$PREC-mean(test1$PREC))*(test1$PREC-mean(test1$PREC))
>>>>> sst1<-sum(ss)
>>>>> e <-(test1$PREC-test1$rk.predicted)*(test1$PREC-test1$rk.predicted)
>>>>> sse.rk<-sum(e)
>>>>> rk1.r.square<-1-(sse.rk/sst1)
>>>>>           
>>>> My problem is that, for some datasets the methods can be resulted with
>>>> negative r-squares. Here I gave an example about regression kriging
>>>> but also same problem may occur for linear regression. I checked the
>>>> dependent and independent variables and there is no problem with them.
>>>> Are there anyone who knows another function instead of overlay() for
>>>> the same purpose? (I thougt that maybe the problem is because of
>>>> overlay function) or do you have any idea about reason of negative
>>>> r-square values?
>>>>
>>>> Best regards,
>>>> Pinar
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> Pinar Aslantas Bostan
>>>> Research Assistant
>>>> Department of Geodetic and
>>>> Geographic Information Technologies (GGIT)
>>>> Middle East Technical University
>>>> 06531 Ankara/TURKEY
>>>> aslantas at metu.edu.tr
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>         
>> --
>> Edzer Pebesma
>> Institute for Geoinformatics (ifgi), University of Münster
>> Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
>> 8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
>> http://www.52north.org/geostatistics      e.pebesma at wwu.de
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>     
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list