[R-sig-Geo] negative r-squares

Fri Sep 10 10:39:00 CEST 2010

Dear Jason, Edzer and Caspar,

Thank you very much for your suggestions. I really feel better because  
now (at least) I know that I didn't make mistake about variables and  
implementation of methods. I will try to make adjustment of degrees of  
freedom and using mean of the training dataset for calculating errors  
and r-square. As Edzer said, if the test dataset has an outlier  
(higher precipitation than mean; like 2200 mm) then I obtained the  
r-square negative!

Best regards and good luck with your studies,
Pinar

Alinti Jason Gasper <Jason.Gasper at noaa.gov>

> I may be a little off base here, but wouldn't an ex-sample R^2  
> calculation be required since you are using your test data as a  
> "prediction". So the  ex-sample R^2 would be 1-(SSE test  
> data/sum(Y-mean(Ytrain))^2).  So R^2 in this context has been  
> motivated as a comparison between two competing models.  Thus, a  
> negative R^2 value would indicate your  ex-sample (test data)  
> forecasts are worse than a mean value.
>
> caspar hallmann wrote:
>> This raises a question though whether one should use the mean of the
>> training data or the mean of the test data in calculating the total
>> sum of squares. I believe the first is more fair with respect to
>> answering whether a given model is any better as compared the null
>> model in predicting the response. When using sst based the mean of the
>> test data you are essentially comparing your model to a null model
>> that has been based on different data (which i think isn't fair), and
>> its probably the reason why the ss.err > sst,  and hence R2<0.
>>
>> Caspar
>>
>>
>>
>> On Thu, Sep 9, 2010 at 9:15 PM, Edzer Pebesma
>> <edzer.pebesma at uni-muenster.de> wrote:
>>
>>> Pinar, Jason,
>>>
>>> From the script below it seems no adjustment for degrees of freedom
>>> is being made.
>>>
>>> In this case R2 can become negative because you use a different
>>> test and train set. Suppose the test set contains one single
>>> extreme that is not present in the training set. In that case, the
>>> mean of the test values is, in terms of sum of squares, a better
>>> predicter than your regression model that didn't know about this
>>> outlier. Don't forget that the mean of the test set does contain
>>> this outlier. Hence, R2 can easily become negative when evaluated
>>> over a different data set then the regression model was derived from.
>>>
>>> On 09/09/2010 06:25 PM, Jason Gasper wrote:
>>>
>>>> Hello Pinar,
>>>>
>>>> I don't know for sure what your calculation is, but R2 values can range
>>>> from -inf to 1 if an adjusted R2 is being used. In other words, one
>>>> possibility is that your adjusting for degrees of freedom using some
>>>> variation of the following (n-1/n-k)(1-R2) where the adjusted R2 is
>>>> equivalent to simple regression when k=1.  So when the estimated R2 less
>>>> than or equal to 0 that means the model forecast is inferior to the mean
>>>> (really poor fit). Another way of looking at a negative R2 is that the
>>>> fit is worse than a horizontal line, so the sum-of-squares from the
>>>> model is larger than the sum-of-squares from a horizontal line. Again,
>>>> poor fit.
>>>>
>>>> Cheers-Jason
>>>>
>>>>
>>>> Pinar Aslantas Bostan wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am working about comparison of kriging and regression methods. I
>>>>> have one dependent (PREC) and seven independent variables. I created
>>>>> 10 different test and train datasets. I am using train datasets for
>>>>> building the models and test datasets for calculating error (RMSE) and
>>>>> r-squares. When I obtained prediction values for grid, then I use
>>>>> overlay() to get predictions for test dataset. For example:
>>>>>
>>>>> # regression kriging
>>>>> # dem is the grid (I want to get predictions for each pixel of dem)
>>>>> and dem$rk.pred1 contains regression kriging predictions
>>>>>
>>>>>> test1$rk.predicted = dem$rk.pred1[overlay(dem, test1)]
>>>>>>
>>>>> # calculating r-square values based on test values
>>>>>
>>>>>> ss <-(test1$PREC-mean(test1$PREC))*(test1$PREC-mean(test1$PREC))
>>>>>> sst1<-sum(ss)
>>>>>> e <-(test1$PREC-test1$rk.predicted)*(test1$PREC-test1$rk.predicted)
>>>>>> sse.rk<-sum(e)
>>>>>> rk1.r.square<-1-(sse.rk/sst1)
>>>>>>
>>>>> My problem is that, for some datasets the methods can be resulted with
>>>>> negative r-squares. Here I gave an example about regression kriging
>>>>> but also same problem may occur for linear regression. I checked the
>>>>> dependent and independent variables and there is no problem with them.
>>>>> Are there anyone who knows another function instead of overlay() for
>>>>> the same purpose? (I thougt that maybe the problem is because of
>>>>> overlay function) or do you have any idea about reason of negative
>>>>> r-square values?
>>>>>
>>>>> Best regards,
>>>>> Pinar
>>>>>
>>>>>
>>>>> ********************************************************************************
>>>>>
>>>>> Pinar Aslantas Bostan
>>>>> Research Assistant
>>>>> Department of Geodetic and
>>>>> Geographic Information Technologies (GGIT)
>>>>> Middle East Technical University
>>>>> 06531 Ankara/TURKEY
>>>>> aslantas at metu.edu.tr
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>> --
>>> Edzer Pebesma
>>> Institute for Geoinformatics (ifgi), University of Münster
>>> Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
>>> 8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
>>> http://www.52north.org/geostatistics      e.pebesma at wwu.de
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>