[R] Interaction factor and numeric variable versus separate
(Ted Harding)
ted.harding at nessie.mcc.ac.uk
Tue Aug 7 18:01:37 CEST 2007
On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
> In the single model all three levels share the same intercept which
> means that the slope must change to accomodate it
> whereas in the three separate models they each have their own
> intercept.
I think this arose because of the formulation of the "model with
interaction" as:
summary(lm(y~x:f, data=d))
If it has been formulated as
summary(lm(y~x*f, data=d))
there would be three separate intercepts, and three different slopes
(and the differences would be the same as the differences for the
separate models).
Ted.
> Try looking at it graphically and note how the black dotted lines
> are all forced to go through the same intercept, i.e. the same point
> on the y axis, whereas the red dashed lines are each able to
> fit their portion of the data using both the intercept and the slope.
>
> y.lm <- lm(y~x:f, data=d)
> plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
> for(i in 1:3) {
> abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
> abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
> "dashed")
> }
> grid()
>
>
> On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:
>> Dear list members,
>>
>> I have problems to interpret the coefficients from a lm model
>> involving
>> the interaction of a numeric and factor variable compared to separate
>> lm
>> models for each level of the factor variable.
>>
>> ## data:
>> y1 <- rnorm(20) + 6.8
>> y2 <- rnorm(20) + (1:20*1.7 + 1)
>> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
>> y <- c(y1,y2,y3)
>> x <- rep(1:20,3)
>> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
>> d <- data.frame(x=x,y=y, f=f)
>>
>> ## plot
>> # xyplot(y~x|f)
>>
>> ## lm model with interaction
>> summary(lm(y~x:f, data=d))
>>
>> Call:
>> lm(formula = y ~ x:f, data = d)
>>
>> Residuals:
>> Min 1Q Median 3Q Max
>> -2.8109 -0.8302 0.2542 0.6737 3.5383
>>
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 3.68799 0.41045 8.985 1.91e-12 ***
>> x:flev1 0.20885 0.04145 5.039 5.21e-06 ***
>> x:flev2 1.49670 0.04145 36.109 < 2e-16 ***
>> x:flev3 6.70815 0.04145 161.838 < 2e-16 ***
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Residual standard error: 1.53 on 56 degrees of freedom
>> Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984
>> F-statistic: 1.191e+04 on 3 and 56 DF, p-value: < 2.2e-16
>>
>> ## separate lm fits
>> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
>> $lev1
>> (Intercept) x
>> 6.77022860 -0.01667528
>>
>> $lev2
>> (Intercept) x
>> 1.019078 1.691982
>>
>> $lev3
>> (Intercept) x
>> 3.274656 6.738396
>>
>>
>> Can anybody give me a hint why the coefficients for the slopes
>> (especially for lev1) are so different and how the coefficients from
>> the
>> lm model with interaction are related to the separate fits?
>>
>> Thanks, Sven
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07 Time: 17:01:33
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list