# [R] Help - linear regression

Mark Difford mark_difford at yahoo.co.uk
Fri Jan 25 18:43:28 CET 2008

```Hi All,

Thanjuvar wrote:
>> model2<-lm(lavi~age+sex+age*race+diabetes+hypertension, data=tb1)

David wrote:
>>  in the second equation you are only including the interaction of
>> age*race,
>>  the main effect of age, but not the main effect of race which is what
>> came out significant

I am sorry, but this is wrong.  Read up about model formulae in
http://cran.r-project.org/doc/manuals/R-intro.html#Statistical-models-in-R

The expression age * race expands to age + race + age:race.  That is, main
effects of age and race, plus the interaction between age and race
[age:race].  The expansion is done automatically.

Thanjavur: Model selection is a huge subject.  However, once you taken in
the above fact, you will see that the __only__ difference between your two
models is that you have added an interaction term for age:race  You have two
simple, but still very effective approaches.

## 1: Test the two models by doing:
anova(model1, model2)

##1: Use stepAIC (you need MASS installed) on model 2, and see what happens
to the
##    interaction term
require(MASS)
stepAIC(model2, test="Chi")\$anova

See:
?anova
?stepAIC

HTH,
Mark.

David Young-18 wrote:
>
> Thanjavur,
>
> I'm new to R, so it is possible I'm interpreting you syntax
> incorrectly, but it looks like in the second equation you are only
> including the interaction of age*race, the main effect of age, but
> not the main effect of race which is what came out significant in your
> first model.
>
> In effect you have measured two different things and one of them is
> significant.  In the first regression you have measured a general
> shift in the regression giving each racial group a different
> intercept.  In the second, you are measuring whether there should be
> two different slopes for the line relating to age.  One for european
> ages and one for non-european ages, which did not turn out to be
> significant.
>
> Based on the information you have presented you should not include the
> interaction, but should include the main effect for race.  HOWEVER, as
> a general rule, you should include the main effects along with your
> test for interactions between them.  age,race,age*race
> When you do this it is possible that the interaction will then also be
> significant.
>
> Hope that helps.
>
> Dave
>
> Tuesday, January 22, 2008, 11:20:01 AM, you wrote:
>
>
> TB> Hi,
>
> TB> I am trying a linear regression model where the dependent variable is
> the size of the heart corrected for the patient's height and weight. This
> is labelled as LAVI. The independent variables are
> TB> race (european or non-eurpoean), age, sex (male or female) of the
> patient and whether they have diabetes and high blood pressure. sample
> size 2000 patients selected from a community.
>
> TB> when I model
> TB> model1<-lm(lavi~age+sex+race+diabetes+hypertension, data=tb1)
> TB>  and
> TB> model2<-lm(lavi~age+sex+age*race+diabetes+hypertension, data=tb1)
>
> TB> in the first model race comes out as a significant predictor (p<0.005)
> where as in the second model race is not a significant predictor of lavi
> (p=.076)
>
> TB> in my dataset mean age is 55.2 years in the non-europeans and 56.7
> years in the europeans (p <0.0001 by t.test).
>
> TB> should I or should I not include the interaction (age*race) in the
> model. Is it an acceptable rule to put in interactions if there is a
> significant relation between the indepenedent variables in
> TB> univariate analyses.
>
> TB> Many thanks
>
> TB> _________________________________________________________________
> TB> Helping your favorite cause is as easy as instant messaging. You IM,
> we give.
>
> TB>         [[alternative HTML version deleted]]
>
>
>
>
> --
> Best regards,
>
> David Young
>                             mailto:dyoung at telefonica.net
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help