Regression query : steps for model building

Devshruti Pahuja devshruti at hotmail.com
Fri Jun 11 19:41:00 CEST 2004

```Hi

I have a set of data with both quantitative and categorical predictors.
After scaling of response variable, i looked for multicollinearity (VIF
values) among the predictors and removed the predictors who were hinding
some of the
other significant predictors. I'm curious to know whether the predictors
(who are not significant) while doing simple 'lm' will be involved in
interactions. How do i take into
account  interactions of those predictors whom i removed just on the basis
of  multicollinearity ?

I'll appreciate if someone can throw some light on this matter and how to
use R to detect the interactions effectively .

Thanks

Regards
Dev

> ------Final 'lm model'--------------------
> > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +
walk
> + obp + strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)
> > summary(logmodelfull_minus_run_hr_walk_batting)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out +
>     free.agent.eligible + free.agent.1991 + arbitr.elgible.)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.41786 -0.28911 -0.02814  0.31890  1.49007
>
> Coefficients:
>                       Estimate Std. Error t value Pr(>|t|)
> (Intercept)           5.340782   0.251218  21.260  < 2e-16 ***
> hit                   0.004479   0.001158   3.867 0.000133 ***
> rbi                   0.011102   0.002195   5.059 7.05e-07 ***
> walk                  0.005421   0.002206   2.457 0.014533 *
> obp                  -1.385584   0.824105  -1.681 0.093653 .
> strike.out           -0.005399   0.001438  -3.755 0.000205 ***
> free.agent.eligible1  1.611521   0.080657  19.980  < 2e-16 ***
> free.agent.19911     -0.301243   0.103481  -2.911 0.003848 **
> arbitr.elgible.1      1.293059   0.086696  14.915  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.5351 on 328 degrees of freedom
> Multiple R-Squared: 0.7981,     Adjusted R-squared: 0.7932
> F-statistic: 162.1 on 8 and 328 DF,  p-value: < 2.2e-16
>
> --------------------------------------------------------------------------
--
> ----------------------------------------------------
>
>
> --------------with
>
interactions----------------------------------------------------------------
> ---------------------------
>
> >
> > summary(baseball.lgmodel_with_interactions_ALL_arbid)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + strike.out + free.agent.eligible +
>     free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 +
>     hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible +
>     rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out +
>     strike.out * free.agent.eligible + strike.out * arbitr.elgible. +
>     strike.out * run + strike.out * hr + hit * free.agent.eligible +
>     free.agent.eligible * run + hit * free.agent.1991 + strike.out *
>     free.agent.1991 + free.agent.1991 * batting + free.agent.1991 *
>     obp + arbitr.elgible. * run + batting * double + obp * run +
>     obp * hr + walk * stolen.base + hit * arbitr.1991 +
free.agent.eligible
> *
>     double + arbitr.elgible. * double + strike.out * triple +
>     triple * batting + triple * walk + triple * walk + hit *
>     hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 *
>     hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk +
>     free.agent.eligible * walk + walk * rbi + rbi * stolen.base +
>     strike.out * stolen.base + stolen.base * batting + stolen.base *
>     walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. *
>     error)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.29352 -0.28287 -0.03748  0.29790  1.31590
>
> Coefficients:
>                                   Estimate Std. Error t value Pr(>|t|)
> (Intercept)                      5.217e+00  3.467e-01  15.048  < 2e-16 ***
> hit                              6.927e-03  6.226e-03   1.112 0.266889
> rbi                              1.908e-02  1.150e-02   1.658 0.098350 .
> strike.out                      -5.692e-03  4.586e-03  -1.241 0.215517
> free.agent.eligible1             1.287e+00  2.259e-01   5.699 3.05e-08 ***
> free.agent.19911                 3.828e-01  6.575e-01   0.582 0.560914
> arbitr.elgible.1                 1.038e+00  2.195e-01   4.726 3.63e-06 ***
> arbitr.19911                    -1.024e+00  4.392e-01  -2.331 0.020443 *
> run                              4.932e-02  2.905e-02   1.698 0.090682 .
> hr                              -1.093e-01  7.208e-02  -1.516 0.130543
> batting                         -1.814e-01  2.558e+00  -0.071 0.943522
> obp                             -1.375e+00  2.253e+00  -0.610 0.542099
> double                          -5.259e-02  4.489e-02  -1.172 0.242349
> walk                             1.395e-02  9.757e-03   1.430 0.153808
> stolen.base                     -1.685e-02  4.299e-02  -0.392 0.695372
> triple                          -1.367e-01  1.600e-01  -0.854 0.393807
> error                           -4.097e-03  6.879e-03  -0.595 0.552007
> hit:free.agent.19911             8.248e-04  4.611e-03   0.179 0.858174
> hit:arbitr.elgible.1             4.873e-03  6.448e-03   0.756 0.450395
> hit:rbi                         -1.382e-04  7.709e-05  -1.792 0.074184 .
> rbi:free.agent.eligible1         5.352e-03  9.555e-03   0.560 0.575855
> rbi:arbitr.elgible.1            -3.384e-03  1.136e-02  -0.298 0.766072
> rbi:arbitr.19911                 3.596e-02  2.179e-02   1.650 0.100046
> hit:strike.out                   5.480e-06  5.446e-05   0.101 0.919917
> strike.out:free.agent.eligible1 -2.570e-03  4.282e-03  -0.600 0.548890
> strike.out:arbitr.elgible.1     -9.703e-04  5.234e-03  -0.185 0.853068
> strike.out:run                   1.685e-04  1.246e-04   1.352 0.177345
> strike.out:hr                   -3.088e-04  2.277e-04  -1.356 0.176229
> hit:free.agent.eligible1        -1.359e-03  6.224e-03  -0.218 0.827363
> free.agent.eligible1:run         1.248e-02  9.109e-03   1.370 0.171917
> strike.out:free.agent.19911     -1.851e-02  5.974e-03  -3.099 0.002140 **
> free.agent.19911:batting         7.076e-01  6.200e+00   0.114 0.909215
> free.agent.19911:obp            -1.421e+00  3.952e+00  -0.360 0.719394
> arbitr.elgible.1:run            -8.541e-03  8.773e-03  -0.974 0.331100
> batting:double                   2.346e-01  1.609e-01   1.458 0.145884
> run:obp                         -1.825e-01  7.492e-02  -2.436 0.015462 *
> hr:obp                           3.687e-01  2.116e-01   1.742 0.082608 .
> walk:stolen.base                -6.789e-05  1.557e-04  -0.436 0.663083
> hit:arbitr.19911                -5.835e-03  7.084e-03  -0.824 0.410808
> free.agent.eligible1:double     -1.151e-02  1.663e-02  -0.692 0.489334
> arbitr.elgible.1:double          2.169e-03  1.938e-02   0.112 0.910985
> strike.out:triple               -8.106e-04  6.023e-04  -1.346 0.179475
> batting:triple                   5.179e-01  5.599e-01   0.925 0.355841
> walk:triple                      8.755e-04  9.262e-04   0.945 0.345349
> hit:hr                          -3.320e-04  2.626e-04  -1.264 0.207180
> rbi:hr                           4.748e-04  3.015e-04   1.575 0.116414
> free.agent.eligible1:hr          1.840e-02  2.313e-02   0.796 0.426972
> free.agent.19911:hr              7.216e-02  1.889e-02   3.819 0.000165 ***
> arbitr.elgible.1:hr              4.111e-02  2.803e-02   1.467 0.143564
> arbitr.19911:hr                 -2.368e-02  4.647e-02  -0.510 0.610723
> hit:walk                         3.173e-05  7.826e-05   0.405 0.685442
> free.agent.eligible1:walk       -5.423e-03  4.984e-03  -1.088 0.277472
> rbi:walk                        -7.569e-05  1.313e-04  -0.577 0.564598
> rbi:stolen.base                  3.980e-05  1.605e-04   0.248 0.804409
> strike.out:stolen.base          -2.611e-04  1.615e-04  -1.617 0.107004
> batting:stolen.base              1.552e-01  1.434e-01   1.082 0.280020
> arbitr.elgible.1:error           3.930e-03  1.390e-02   0.283 0.777495
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.4925 on 280 degrees of freedom
> Multiple R-Squared: 0.854,      Adjusted R-squared: 0.8248
> F-statistic: 29.24 on 56 and 280 DF,  p-value: < 2.2e-16
>

```