[R] Interaction factor and numeric variable versus separate regressions

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 7 18:33:18 CEST 2007


These are not the same model.  You want x*f, and then you will find
the differences in intercepts and slopes from group 1 as the coefficients.

Remember too that the combined model pools error variances and the 
separate model has separate error variance for each group.

To understand model formulae, study Bill Venables' exposition in chapter 6 
of MASS.

On Tue, 7 Aug 2007, Sven Garbade wrote:

> Dear list members,
>
> I have problems to interpret the coefficients from a lm model involving
> the interaction of a numeric and factor variable compared to separate lm
> models for each level of the factor variable.
>
> ## data:
> y1 <- rnorm(20) + 6.8
> y2 <- rnorm(20) + (1:20*1.7 + 1)
> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- rep(1:20,3)
> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> d <- data.frame(x=x,y=y, f=f)
>
> ## plot
> # xyplot(y~x|f)
>
> ## lm model with interaction
> summary(lm(y~x:f, data=d))
>
> Call:
> lm(formula = y ~ x:f, data = d)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -2.8109 -0.8302  0.2542  0.6737  3.5383
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 1.53 on 56 degrees of freedom
> Multiple R-Squared: 0.9984,	Adjusted R-squared: 0.9984
> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
>
> ## separate lm fits
> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
> $lev1
> (Intercept)           x
> 6.77022860 -0.01667528
>
> $lev2
> (Intercept)           x
>   1.019078    1.691982
>
> $lev3
> (Intercept)           x
>   3.274656    6.738396
>
>
> Can anybody give me a hint why the coefficients for the slopes
> (especially for lev1) are so different and how the coefficients from the
> lm model with interaction are related to the separate fits?
>
> Thanks, Sven
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list