[R] Interaction factor and numeric variable versus separate

Wed Aug 8 14:06:56 CEST 2007

Thanks to all for the very helpful replies & the reference to a chapter
in MASS!

Sven

On Tue, 2007-08-07 at 12:07 -0400, Gabor Grothendieck wrote:
> Also check this post
> 
> https://stat.ethz.ch/pipermail/r-help/2007-May/132866.html
> 
> for a number of formulations.
> 
> On 8/7/07, Ted Harding <ted.harding at nessie.mcc.ac.uk> wrote:
> > On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
> > > In the single model all three levels share the same intercept which
> > > means that the slope must change to accomodate it
> > > whereas in the three separate models they each have their own
> > > intercept.
> >
> > I think this arose because of the formulation of the "model with
> > interaction" as:
> >
> >  summary(lm(y~x:f, data=d))
> >
> > If it has been formulated as
> >
> >  summary(lm(y~x*f, data=d))
> >
> > there would be three separate intercepts, and three different slopes
> > (and the differences would be the same as the differences for the
> > separate models).
> >
> > Ted.
> >
> > > Try looking at it graphically and note how the black dotted lines
> > > are all forced to go through the same intercept, i.e. the same point
> > > on the y axis, whereas the red dashed lines are each able to
> > > fit their portion of the data using both the intercept and the slope.
> > >
> > > y.lm <- lm(y~x:f, data=d)
> > > plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
> > > for(i in 1:3) {
> > >       abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
> > >       abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
> > > "dashed")
> > > }
> > > grid()
> > >
> > >
> > > On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:
> > >> Dear list members,
> > >>
> > >> I have problems to interpret the coefficients from a lm model
> > >> involving
> > >> the interaction of a numeric and factor variable compared to separate
> > >> lm
> > >> models for each level of the factor variable.
> > >>
> > >> ## data:
> > >> y1 <- rnorm(20) + 6.8
> > >> y2 <- rnorm(20) + (1:20*1.7 + 1)
> > >> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> > >> y <- c(y1,y2,y3)
> > >> x <- rep(1:20,3)
> > >> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> > >> d <- data.frame(x=x,y=y, f=f)
> > >>
> > >> ## plot
> > >> # xyplot(y~x|f)
> > >>
> > >> ## lm model with interaction
> > >> summary(lm(y~x:f, data=d))
> > >>
> > >> Call:
> > >> lm(formula = y ~ x:f, data = d)
> > >>
> > >> Residuals:
> > >>    Min      1Q  Median      3Q     Max
> > >> -2.8109 -0.8302  0.2542  0.6737  3.5383
> > >>
> > >> Coefficients:
> > >>            Estimate Std. Error t value Pr(>|t|)
> > >> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
> > >> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
> > >> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
> > >> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
> > >> ---
> > >> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >>
> > >> Residual standard error: 1.53 on 56 degrees of freedom
> > >> Multiple R-Squared: 0.9984,     Adjusted R-squared: 0.9984
> > >> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
> > >>
> > >> ## separate lm fits
> > >> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
> > >> $lev1
> > >> (Intercept)           x
> > >>  6.77022860 -0.01667528
> > >>
> > >> $lev2
> > >> (Intercept)           x
> > >>   1.019078    1.691982
> > >>
> > >> $lev3
> > >> (Intercept)           x
> > >>   3.274656    6.738396
> > >>
> > >>
> > >> Can anybody give me a hint why the coefficients for the slopes
> > >> (especially for lev1) are so different and how the coefficients from
> > >> the
> > >> lm model with interaction are related to the separate fits?
> > >>
> > >> Thanks, Sven