[R] Interaction factor and numeric variable versus separate

Tue Aug 7 18:01:37 CEST 2007

On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
> In the single model all three levels share the same intercept which
> means that the slope must change to accomodate it
> whereas in the three separate models they each have their own
> intercept.

I think this arose because of the formulation of the "model with
interaction" as:

  summary(lm(y~x:f, data=d))

If it has been formulated as

  summary(lm(y~x*f, data=d))

there would be three separate intercepts, and three different slopes
(and the differences would be the same as the differences for the
separate models).

Ted.

> Try looking at it graphically and note how the black dotted lines
> are all forced to go through the same intercept, i.e. the same point
> on the y axis, whereas the red dashed lines are each able to
> fit their portion of the data using both the intercept and the slope.
> 
> y.lm <- lm(y~x:f, data=d)
> plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
> for(i in 1:3) {
>       abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
>       abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
> "dashed")
> }
> grid()
> 
> 
> On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:
>> Dear list members,
>>
>> I have problems to interpret the coefficients from a lm model
>> involving
>> the interaction of a numeric and factor variable compared to separate
>> lm
>> models for each level of the factor variable.
>>
>> ## data:
>> y1 <- rnorm(20) + 6.8
>> y2 <- rnorm(20) + (1:20*1.7 + 1)
>> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
>> y <- c(y1,y2,y3)
>> x <- rep(1:20,3)
>> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
>> d <- data.frame(x=x,y=y, f=f)
>>
>> ## plot
>> # xyplot(y~x|f)
>>
>> ## lm model with interaction
>> summary(lm(y~x:f, data=d))
>>
>> Call:
>> lm(formula = y ~ x:f, data = d)
>>
>> Residuals:
>>    Min      1Q  Median      3Q     Max
>> -2.8109 -0.8302  0.2542  0.6737  3.5383
>>
>> Coefficients:
>>            Estimate Std. Error t value Pr(>|t|)
>> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
>> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
>> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
>> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Residual standard error: 1.53 on 56 degrees of freedom
>> Multiple R-Squared: 0.9984,     Adjusted R-squared: 0.9984
>> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
>>
>> ## separate lm fits
>> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
>> $lev1
>> (Intercept)           x
>>  6.77022860 -0.01667528
>>
>> $lev2
>> (Intercept)           x
>>   1.019078    1.691982
>>
>> $lev3
>> (Intercept)           x
>>   3.274656    6.738396
>>
>>
>> Can anybody give me a hint why the coefficients for the slopes
>> (especially for lev1) are so different and how the coefficients from
>> the
>> lm model with interaction are related to the separate fits?
>>
>> Thanks, Sven
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07                                       Time: 17:01:33
------------------------------ XFMail ------------------------------