[R] lm without intercept
Jay Emerson
jayemerson at gmail.com
Fri Feb 18 14:02:16 CET 2011
No, this is a cute problem, though: the definition of R^2 changes
without the intercept, because the
"empty" model used for calculating the total sums of squares is always
predicting 0 (so the total sums
of squares are sums of squares of the observations themselves, without
centering around the sample
Your interpretation of the p-value for the intercept in the first
model is also backwards: 0.9535 is extremely
weak evidence against the hypothesis that the intercept is 0. That
is, the intercept might be near zero, but
could also be something veru different. With a standard error of 229,
your 95% confidence interval
for the intercept (if you trusted it based on other things) would have
a margin of error of well over 400. If you
told me that an intercept of, say 350 or 400 were consistent with your
knowledge of the problem, I wouldn't
This is a very small data set: if you sent an R command such as:
x <- c(x1, x2, ..., xn)
y <- c(y1, y2, ..., yn)
you might even get some more interesting feedback. One of the many
good intro stats textbooks might
also be helpful as you get up to speed.
Original post:
Message: 135
Date: Fri, 18 Feb 2011 11:49:41 +0100
From: Jan <jrheinlaender at gmx.de>
To: "R-help at r-project.org list" <r-help at r-project.org>
Subject: [R] lm without intercept
Message-ID: <1298026181.2847.19.camel at jan-laptop>
Content-Type: text/plain; charset="UTF-8"
I am not a statistics expert, so I have this question. A linear model
gives me the following summary:
lm(formula = N ~ N_alt)
Min 1Q Median 3Q Max
-110.30 -35.80 -22.77 38.07 122.76
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.5177 229.0764 0.059 0.9535
N_alt 0.2832 0.1501 1.886 0.0739 .
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 56.77 on 20 degrees of freedom
(16 observations deleted due to missingness)
Multiple R-squared: 0.151, Adjusted R-squared: 0.1086
F-statistic: 3.558 on 1 and 20 DF, p-value: 0.07386
The regression is not very good (high p-value, low R-squared).
The Pr value for the intercept seems to indicate that it is zero with a
very high probability (95.35%). So I repeat the regression forcing the
intercept to zero:
lm(formula = N ~ N_alt - 1)
Min 1Q Median 3Q Max
-110.11 -36.35 -22.13 38.59 123.23
Estimate Std. Error t value Pr(>|t|)
N_alt 0.292046 0.007742 37.72 <2e-16 ***
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 55.41 on 21 degrees of freedom
(16 observations deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848
F-statistic: 1423 on 1 and 21 DF, p-value: < 2.2e-16
1. Is my interpretation correct?
2. Is it possible that just by forcing the intercept to become zero, a
bad regression becomes an extremely good one?
3. Why doesn't lm suggest a value of zero (or near zero) by itself if
the regression is so much better with it?
Please excuse my ignorance.
Jan Rheinl?nder
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
More information about the R-help
mailing list