[R] Query on R-squared correlation coefficient for linear regression through origin

Patrick Barrie pjb10 @ending from c@m@@c@uk
Thu Sep 27 12:56:49 CEST 2018


I have a query on the R-squared correlation coefficient for linear 
regression through the origin.

The general expression for R-squared in regression (whether linear or 
non-linear) is
R-squared = 1 - sum(y-ypredicted)^2 / sum(y-ybar)^2

However, the lm function within R does not seem to use this expression 
when the intercept is constrained to be zero. It gives results different 
to Excel and other data analysis packages.

As an example (using built-in cars dataframe):
>  cars.lm=lm(dist ~ 0+speed, data=cars)     # linear regression through 
origin
> summary(cars.lm)$r.squared # report R-squared [1] 0.8962893 > 
1-deviance(cars.lm)/sum((cars$dist-mean(cars$dist))^2)     # calculates 
R-squared directly [1] 0.6018997 > # The latter corresponds to the value 
reported by Excel (and other data analysis packages) > > # Note that we 
expect R-squared to be smaller for linear regression through the origin
 > # than for linear regression without a constraint (which is 0.6511 in 
this example)

Does anyone know what R is doing in this case? Is there an option to get 
R to return what I termed the "general" expression for R-squared? The 
adjusted R-squared value is also affected. [Other parameters all seem 
correct.]

Thanks for any help on this issue,

Patrick

P.S. I believe old versions of Excel (before 2003) also had this issue.

-- 
Dr Patrick J. Barrie
Department of Chemical Engineering and Biotechnology
University of Cambridge
Philippa Fawcett Drive, Cambridge CB3 0AS
01223 331864
pjb10 using cam.ac.uk


	[[alternative HTML version deleted]]



More information about the R-help mailing list