[R] Interaction factor and numeric variable versus separate

Gabor Grothendieck ggrothendieck at gmail.com
Tue Aug 7 18:07:23 CEST 2007


Also check this post

https://stat.ethz.ch/pipermail/r-help/2007-May/132866.html

for a number of formulations.

On 8/7/07, Ted Harding <ted.harding at nessie.mcc.ac.uk> wrote:
> On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
> > In the single model all three levels share the same intercept which
> > means that the slope must change to accomodate it
> > whereas in the three separate models they each have their own
> > intercept.
>
> I think this arose because of the formulation of the "model with
> interaction" as:
>
>  summary(lm(y~x:f, data=d))
>
> If it has been formulated as
>
>  summary(lm(y~x*f, data=d))
>
> there would be three separate intercepts, and three different slopes
> (and the differences would be the same as the differences for the
> separate models).
>
> Ted.
>
> > Try looking at it graphically and note how the black dotted lines
> > are all forced to go through the same intercept, i.e. the same point
> > on the y axis, whereas the red dashed lines are each able to
> > fit their portion of the data using both the intercept and the slope.
> >
> > y.lm <- lm(y~x:f, data=d)
> > plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
> > for(i in 1:3) {
> >       abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
> >       abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
> > "dashed")
> > }
> > grid()
> >
> >
> > On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:
> >> Dear list members,
> >>
> >> I have problems to interpret the coefficients from a lm model
> >> involving
> >> the interaction of a numeric and factor variable compared to separate
> >> lm
> >> models for each level of the factor variable.
> >>
> >> ## data:
> >> y1 <- rnorm(20) + 6.8
> >> y2 <- rnorm(20) + (1:20*1.7 + 1)
> >> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> >> y <- c(y1,y2,y3)
> >> x <- rep(1:20,3)
> >> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> >> d <- data.frame(x=x,y=y, f=f)
> >>
> >> ## plot
> >> # xyplot(y~x|f)
> >>
> >> ## lm model with interaction
> >> summary(lm(y~x:f, data=d))
> >>
> >> Call:
> >> lm(formula = y ~ x:f, data = d)
> >>
> >> Residuals:
> >>    Min      1Q  Median      3Q     Max
> >> -2.8109 -0.8302  0.2542  0.6737  3.5383
> >>
> >> Coefficients:
> >>            Estimate Std. Error t value Pr(>|t|)
> >> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
> >> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
> >> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
> >> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
> >> ---
> >> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>
> >> Residual standard error: 1.53 on 56 degrees of freedom
> >> Multiple R-Squared: 0.9984,     Adjusted R-squared: 0.9984
> >> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
> >>
> >> ## separate lm fits
> >> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
> >> $lev1
> >> (Intercept)           x
> >>  6.77022860 -0.01667528
> >>
> >> $lev2
> >> (Intercept)           x
> >>   1.019078    1.691982
> >>
> >> $lev3
> >> (Intercept)           x
> >>   3.274656    6.738396
> >>
> >>
> >> Can anybody give me a hint why the coefficients for the slopes
> >> (especially for lev1) are so different and how the coefficients from
> >> the
> >> lm model with interaction are related to the separate fits?
> >>
> >> Thanks, Sven
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 07-Aug-07                                       Time: 17:01:33
> ------------------------------ XFMail ------------------------------
>



More information about the R-help mailing list