[R] Interaction factor and numeric variable versus separate regressions

Tue Aug 7 17:34:13 CEST 2007

In the single model all three levels share the same intercept which
means that the slope must change to accomodate it
whereas in the three separate models they each have their own
intercept.

Try looking at it graphically and note how the black dotted lines
are all forced to go through the same intercept, i.e. the same point
on the y axis, whereas the red dashed lines are each able to
fit their portion of the data using both the intercept and the slope.

y.lm <- lm(y~x:f, data=d)
plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
for(i in 1:3) {
	abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
	abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty = "dashed")
}
grid()

On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:
> Dear list members,
>
> I have problems to interpret the coefficients from a lm model involving
> the interaction of a numeric and factor variable compared to separate lm
> models for each level of the factor variable.
>
> ## data:
> y1 <- rnorm(20) + 6.8
> y2 <- rnorm(20) + (1:20*1.7 + 1)
> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- rep(1:20,3)
> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> d <- data.frame(x=x,y=y, f=f)
>
> ## plot
> # xyplot(y~x|f)
>
> ## lm model with interaction
> summary(lm(y~x:f, data=d))
>
> Call:
> lm(formula = y ~ x:f, data = d)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -2.8109 -0.8302  0.2542  0.6737  3.5383
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 1.53 on 56 degrees of freedom
> Multiple R-Squared: 0.9984,     Adjusted R-squared: 0.9984
> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
>
> ## separate lm fits
> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
> $lev1
> (Intercept)           x
>  6.77022860 -0.01667528
>
> $lev2
> (Intercept)           x
>   1.019078    1.691982
>
> $lev3
> (Intercept)           x
>   3.274656    6.738396
>
>
> Can anybody give me a hint why the coefficients for the slopes
> (especially for lev1) are so different and how the coefficients from the
> lm model with interaction are related to the separate fits?
>
> Thanks, Sven
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>