[R] Regression query : steps for model building
Devshruti Pahuja
devshruti at hotmail.com
Fri Jun 11 19:41:00 CEST 2004
Hi
I have a set of data with both quantitative and categorical predictors.
After scaling of response variable, i looked for multicollinearity (VIF
values) among the predictors and removed the predictors who were hinding
some of the
other significant predictors. I'm curious to know whether the predictors
(who are not significant) while doing simple 'lm' will be involved in
interactions. How do i take into
account interactions of those predictors whom i removed just on the basis
of multicollinearity ?
I'll appreciate if someone can throw some light on this matter and how to
use R to detect the interactions effectively .
Thanks
Regards
Dev
> ------Final 'lm model'--------------------
> > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +
walk
> + obp + strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)
> > summary(logmodelfull_minus_run_hr_walk_batting)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out +
> free.agent.eligible + free.agent.1991 + arbitr.elgible.)
>
> Residuals:
> Min 1Q Median 3Q Max
> -2.41786 -0.28911 -0.02814 0.31890 1.49007
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 5.340782 0.251218 21.260 < 2e-16 ***
> hit 0.004479 0.001158 3.867 0.000133 ***
> rbi 0.011102 0.002195 5.059 7.05e-07 ***
> walk 0.005421 0.002206 2.457 0.014533 *
> obp -1.385584 0.824105 -1.681 0.093653 .
> strike.out -0.005399 0.001438 -3.755 0.000205 ***
> free.agent.eligible1 1.611521 0.080657 19.980 < 2e-16 ***
> free.agent.19911 -0.301243 0.103481 -2.911 0.003848 **
> arbitr.elgible.1 1.293059 0.086696 14.915 < 2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.5351 on 328 degrees of freedom
> Multiple R-Squared: 0.7981, Adjusted R-squared: 0.7932
> F-statistic: 162.1 on 8 and 328 DF, p-value: < 2.2e-16
>
> --------------------------------------------------------------------------
--
> ----------------------------------------------------
>
>
> --------------with
>
interactions----------------------------------------------------------------
> ---------------------------
>
> >
> > summary(baseball.lgmodel_with_interactions_ALL_arbid)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + strike.out + free.agent.eligible +
> free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 +
> hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible +
> rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out +
> strike.out * free.agent.eligible + strike.out * arbitr.elgible. +
> strike.out * run + strike.out * hr + hit * free.agent.eligible +
> free.agent.eligible * run + hit * free.agent.1991 + strike.out *
> free.agent.1991 + free.agent.1991 * batting + free.agent.1991 *
> obp + arbitr.elgible. * run + batting * double + obp * run +
> obp * hr + walk * stolen.base + hit * arbitr.1991 +
free.agent.eligible
> *
> double + arbitr.elgible. * double + strike.out * triple +
> triple * batting + triple * walk + triple * walk + hit *
> hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 *
> hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk +
> free.agent.eligible * walk + walk * rbi + rbi * stolen.base +
> strike.out * stolen.base + stolen.base * batting + stolen.base *
> walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. *
> error)
>
> Residuals:
> Min 1Q Median 3Q Max
> -2.29352 -0.28287 -0.03748 0.29790 1.31590
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 5.217e+00 3.467e-01 15.048 < 2e-16 ***
> hit 6.927e-03 6.226e-03 1.112 0.266889
> rbi 1.908e-02 1.150e-02 1.658 0.098350 .
> strike.out -5.692e-03 4.586e-03 -1.241 0.215517
> free.agent.eligible1 1.287e+00 2.259e-01 5.699 3.05e-08 ***
> free.agent.19911 3.828e-01 6.575e-01 0.582 0.560914
> arbitr.elgible.1 1.038e+00 2.195e-01 4.726 3.63e-06 ***
> arbitr.19911 -1.024e+00 4.392e-01 -2.331 0.020443 *
> run 4.932e-02 2.905e-02 1.698 0.090682 .
> hr -1.093e-01 7.208e-02 -1.516 0.130543
> batting -1.814e-01 2.558e+00 -0.071 0.943522
> obp -1.375e+00 2.253e+00 -0.610 0.542099
> double -5.259e-02 4.489e-02 -1.172 0.242349
> walk 1.395e-02 9.757e-03 1.430 0.153808
> stolen.base -1.685e-02 4.299e-02 -0.392 0.695372
> triple -1.367e-01 1.600e-01 -0.854 0.393807
> error -4.097e-03 6.879e-03 -0.595 0.552007
> hit:free.agent.19911 8.248e-04 4.611e-03 0.179 0.858174
> hit:arbitr.elgible.1 4.873e-03 6.448e-03 0.756 0.450395
> hit:rbi -1.382e-04 7.709e-05 -1.792 0.074184 .
> rbi:free.agent.eligible1 5.352e-03 9.555e-03 0.560 0.575855
> rbi:arbitr.elgible.1 -3.384e-03 1.136e-02 -0.298 0.766072
> rbi:arbitr.19911 3.596e-02 2.179e-02 1.650 0.100046
> hit:strike.out 5.480e-06 5.446e-05 0.101 0.919917
> strike.out:free.agent.eligible1 -2.570e-03 4.282e-03 -0.600 0.548890
> strike.out:arbitr.elgible.1 -9.703e-04 5.234e-03 -0.185 0.853068
> strike.out:run 1.685e-04 1.246e-04 1.352 0.177345
> strike.out:hr -3.088e-04 2.277e-04 -1.356 0.176229
> hit:free.agent.eligible1 -1.359e-03 6.224e-03 -0.218 0.827363
> free.agent.eligible1:run 1.248e-02 9.109e-03 1.370 0.171917
> strike.out:free.agent.19911 -1.851e-02 5.974e-03 -3.099 0.002140 **
> free.agent.19911:batting 7.076e-01 6.200e+00 0.114 0.909215
> free.agent.19911:obp -1.421e+00 3.952e+00 -0.360 0.719394
> arbitr.elgible.1:run -8.541e-03 8.773e-03 -0.974 0.331100
> batting:double 2.346e-01 1.609e-01 1.458 0.145884
> run:obp -1.825e-01 7.492e-02 -2.436 0.015462 *
> hr:obp 3.687e-01 2.116e-01 1.742 0.082608 .
> walk:stolen.base -6.789e-05 1.557e-04 -0.436 0.663083
> hit:arbitr.19911 -5.835e-03 7.084e-03 -0.824 0.410808
> free.agent.eligible1:double -1.151e-02 1.663e-02 -0.692 0.489334
> arbitr.elgible.1:double 2.169e-03 1.938e-02 0.112 0.910985
> strike.out:triple -8.106e-04 6.023e-04 -1.346 0.179475
> batting:triple 5.179e-01 5.599e-01 0.925 0.355841
> walk:triple 8.755e-04 9.262e-04 0.945 0.345349
> hit:hr -3.320e-04 2.626e-04 -1.264 0.207180
> rbi:hr 4.748e-04 3.015e-04 1.575 0.116414
> free.agent.eligible1:hr 1.840e-02 2.313e-02 0.796 0.426972
> free.agent.19911:hr 7.216e-02 1.889e-02 3.819 0.000165 ***
> arbitr.elgible.1:hr 4.111e-02 2.803e-02 1.467 0.143564
> arbitr.19911:hr -2.368e-02 4.647e-02 -0.510 0.610723
> hit:walk 3.173e-05 7.826e-05 0.405 0.685442
> free.agent.eligible1:walk -5.423e-03 4.984e-03 -1.088 0.277472
> rbi:walk -7.569e-05 1.313e-04 -0.577 0.564598
> rbi:stolen.base 3.980e-05 1.605e-04 0.248 0.804409
> strike.out:stolen.base -2.611e-04 1.615e-04 -1.617 0.107004
> batting:stolen.base 1.552e-01 1.434e-01 1.082 0.280020
> arbitr.elgible.1:error 3.930e-03 1.390e-02 0.283 0.777495
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.4925 on 280 degrees of freedom
> Multiple R-Squared: 0.854, Adjusted R-squared: 0.8248
> F-statistic: 29.24 on 56 and 280 DF, p-value: < 2.2e-16
>
More information about the R-help
mailing list