[R] Query on R-squared correlation coefficient for linear regression through origin

J C Nash pro|jcn@@h @end|ng |rom gm@||@com
Thu Sep 27 14:43:17 CEST 2018


This issue that traces back to the very unfortunate use
of R-squared as the name of a tool to simply compare a model to the model that
is a single number (the mean). The mean can be shown to be the optimal choice
for a model that is a single number, so it makes sense to try to do better.

The OP has the correct form -- and I find no matter what the software, when
working with models that do NOT have a constant in them (i.e., nonlinear
models, regression through the origin) it pays to do the calculation
"manually". In R it is really easy to write the necessary function, so
why take a chance that a software developer has tried to expand the concept
using a personal choice that is beyond a clear definition.

I've commented elsewhere that I use this statistic even for nonlinear
models in my own software, since I think one should do better than the
mean for a model, but other workers shy away from using it for nonlinear
models because there may be false interpretation based on its use for
linear models.

JN


On 2018-09-27 06:56 AM, Patrick Barrie wrote:
> I have a query on the R-squared correlation coefficient for linear 
> regression through the origin.
> 
> The general expression for R-squared in regression (whether linear or 
> non-linear) is
> R-squared = 1 - sum(y-ypredicted)^2 / sum(y-ybar)^2
> 
> However, the lm function within R does not seem to use this expression 
> when the intercept is constrained to be zero. It gives results different 
> to Excel and other data analysis packages.
> 
> As an example (using built-in cars dataframe):
>>  cars.lm=lm(dist ~ 0+speed, data=cars)     # linear regression through 
> origin
>> summary(cars.lm)$r.squared # report R-squared [1] 0.8962893 > 
> 1-deviance(cars.lm)/sum((cars$dist-mean(cars$dist))^2)     # calculates 
> R-squared directly [1] 0.6018997 > # The latter corresponds to the value 
> reported by Excel (and other data analysis packages) > > # Note that we 
> expect R-squared to be smaller for linear regression through the origin
>  > # than for linear regression without a constraint (which is 0.6511 in 
> this example)
> 
> Does anyone know what R is doing in this case? Is there an option to get 
> R to return what I termed the "general" expression for R-squared? The 
> adjusted R-squared value is also affected. [Other parameters all seem 
> correct.]
> 
> Thanks for any help on this issue,
> 
> Patrick
> 
> P.S. I believe old versions of Excel (before 2003) also had this issue.
>




More information about the R-help mailing list