[R] Stats 101 : lm with/without intercept
Duncan Murdoch
murdoch at stats.uwo.ca
Sat Sep 22 00:00:05 CEST 2007
On 21/09/2007 4:47 PM, Yves Moisan wrote:
> I am puzzled at the use of regression. I have a categorical variable
> ClassePop33000 which factors a Population variable into 3 levels. I want to
> investigate whether that categorical variable has some relation with my
> dependent variable, so I go :
>
> lm(Cout.ton ~ ClassePop33000, data=ech2)
>
> Call:
> lm(formula = Cout.ton ~ ClassePop33000, data = ech2)
>
> Residuals:
> Min 1Q Median 3Q Max
> -182.24 -62.91 -22.76 66.38 277.39
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 231.66 11.50 20.141 < 2e-16 ***
> ClassePop33000[T.[3000,25000)] -72.91 16.70 -4.366 2.19e-05 ***
> ClassePop33000[T.[25000,10000000)] -95.17 19.92 -4.777 3.82e-06 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 97.6 on 170 degrees of freedom
> Multiple R-Squared: 0.1502, Adjusted R-squared: 0.1402
> F-statistic: 15.02 on 2 and 170 DF, p-value: 9.818e-07
>
>
> Now I discovered one could omit the intercept and therefore have
> coefficients for the N levels of the categorical variable. So I went :
>
> lm(Cout.ton ~ ClassePop33000 + 0, data=ech2)
>
> Call:
> lm(formula = Cout.ton ~ ClassePop33000 + 0, data = ech2)
>
> Residuals:
> Min 1Q Median 3Q Max
> -182.24 -62.91 -22.76 66.38 277.39
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> ClassePop33000[1,3000) 231.66 11.50 20.141 < 2e-16 ***
> ClassePop33000[3000,25000) 158.75 12.11 13.114 < 2e-16 ***
> ClassePop33000[25000,10000000) 136.49 16.27 8.391 1.8e-14 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 97.6 on 170 degrees of freedom
> Multiple R-Squared: 0.7922, Adjusted R-squared: 0.7885
> F-statistic: 216 on 3 and 170 DF, p-value: < 2.2e-16
>
>
> I tried the very pedagogical examples at
> http://www.stat.umn.edu/geyer/5102/examp/dummy.html and plotting the
> regression lines with abline gives me the exact same lines whether I use
> with or without intercept. Now why do R squared differ then ? At least the
> p-values are of the same order of magnitude, but I don't understand the
> drastic difference in R squared. Pointers to stats 101 anyone ?
The standard definition of R-squared assumes there's an intercept
present. If you suppress it, you need to come up with a new definition.
So those values aren't comparable.
Duncan Murdoch
More information about the R-help
mailing list