[R-sig-Geo] negative r-squares

caspar hallmann caspar.hallmann at gmail.com
Thu Sep 9 22:33:58 CEST 2010


This raises a question though whether one should use the mean of the
training data or the mean of the test data in calculating the total
sum of squares. I believe the first is more fair with respect to
answering whether a given model is any better as compared the null
model in predicting the response. When using sst based the mean of the
test data you are essentially comparing your model to a null model
that has been based on different data (which i think isn't fair), and
its probably the reason why the ss.err > sst,  and hence R2<0.

Caspar



On Thu, Sep 9, 2010 at 9:15 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
> Pinar, Jason,
>
> From the script below it seems no adjustment for degrees of freedom
> is being made.
>
> In this case R2 can become negative because you use a different
> test and train set. Suppose the test set contains one single
> extreme that is not present in the training set. In that case, the
> mean of the test values is, in terms of sum of squares, a better
> predicter than your regression model that didn't know about this
> outlier. Don't forget that the mean of the test set does contain
> this outlier. Hence, R2 can easily become negative when evaluated
> over a different data set then the regression model was derived from.
>
> On 09/09/2010 06:25 PM, Jason Gasper wrote:
>> Hello Pinar,
>>
>> I don't know for sure what your calculation is, but R2 values can range
>> from -inf to 1 if an adjusted R2 is being used. In other words, one
>> possibility is that your adjusting for degrees of freedom using some
>> variation of the following (n-1/n-k)(1-R2) where the adjusted R2 is
>> equivalent to simple regression when k=1.  So when the estimated R2 less
>> than or equal to 0 that means the model forecast is inferior to the mean
>> (really poor fit). Another way of looking at a negative R2 is that the
>> fit is worse than a horizontal line, so the sum-of-squares from the
>> model is larger than the sum-of-squares from a horizontal line. Again,
>> poor fit.
>>
>> Cheers-Jason
>>
>>
>> Pinar Aslantas Bostan wrote:
>>> Hi all,
>>>
>>> I am working about comparison of kriging and regression methods. I
>>> have one dependent (PREC) and seven independent variables. I created
>>> 10 different test and train datasets. I am using train datasets for
>>> building the models and test datasets for calculating error (RMSE) and
>>> r-squares. When I obtained prediction values for grid, then I use
>>> overlay() to get predictions for test dataset. For example:
>>>
>>> # regression kriging
>>> # dem is the grid (I want to get predictions for each pixel of dem)
>>> and dem$rk.pred1 contains regression kriging predictions
>>>> test1$rk.predicted = dem$rk.pred1[overlay(dem, test1)]
>>>
>>> # calculating r-square values based on test values
>>>> ss <-(test1$PREC-mean(test1$PREC))*(test1$PREC-mean(test1$PREC))
>>>> sst1<-sum(ss)
>>>> e <-(test1$PREC-test1$rk.predicted)*(test1$PREC-test1$rk.predicted)
>>>> sse.rk<-sum(e)
>>>> rk1.r.square<-1-(sse.rk/sst1)
>>>
>>> My problem is that, for some datasets the methods can be resulted with
>>> negative r-squares. Here I gave an example about regression kriging
>>> but also same problem may occur for linear regression. I checked the
>>> dependent and independent variables and there is no problem with them.
>>> Are there anyone who knows another function instead of overlay() for
>>> the same purpose? (I thougt that maybe the problem is because of
>>> overlay function) or do you have any idea about reason of negative
>>> r-square values?
>>>
>>> Best regards,
>>> Pinar
>>>
>>>
>>> ********************************************************************************
>>>
>>> Pinar Aslantas Bostan
>>> Research Assistant
>>> Department of Geodetic and
>>> Geographic Information Technologies (GGIT)
>>> Middle East Technical University
>>> 06531 Ankara/TURKEY
>>> aslantas at metu.edu.tr
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Edzer Pebesma
> Institute for Geoinformatics (ifgi), University of Münster
> Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
> 8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
> http://www.52north.org/geostatistics      e.pebesma at wwu.de
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list