[R] Regression query

Peter Flom flom at ndri.org
Sun Jun 13 17:03:12 CEST 2004


If variables are colinear, then looking at interactions among them
doesn't make much sense.  High collinearity means that one variable is
nearly a linear combination of others.  IOW, that variable is not adding
much information.  So, if you look at the interaction, you are ALMOST
looking at a quadratic (e.g., if the collinearity involves only 2
variables, then one is very similar to the other, so X1*X2 is almost
X1*X1).  The output will be confusing, to say the least. 

Worse, when you include collinear variables, the resulting equation is
highly sensitive to small (sometimes very small) changes in the data. 
Belsley gives an example where changes in the third decimal place result
in totally different equations.

For details see Belsley's book titled something like "collinearity and
weak data in regression" (sorry, the book and my files are at the
office, but this should let you find it

HTH

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)


>>> "Devshruti Pahuja" <devshruti at hotmail.com> 06/11/04 5:35 AM >>>
Hi

I have a set of data with both quantitative and categorical predictors.
After scaling of response variable, i looked for multicollinearity (VIF
values) among the predictors and removed the predictors who were hinding
some of the
other significant predictors. I'm curious to know whether the predictors
(who are not significant) while doing simple 'lm' will be involved in
interactions. How do i take into
account  interactions of those predictors whom i removed just on the
basis
of  multicollinearity ?

 I'll appreciate if someone can throw some light on this matter and how
to
use R to detect the interactions effectively .

Thanks

 Regards
 Dev

> ------Final 'lm model'--------------------
> > logmodelfull_minus_run_hr_walk_batting <- lm(log(salary) ~ hit+rbi +
walk
> + obp +
strike.out+free.agent.eligible+free.agent.1991+arbitr.elgible.)
> > summary(logmodelfull_minus_run_hr_walk_batting)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + walk + obp + strike.out +
>     free.agent.eligible + free.agent.1991 + arbitr.elgible.)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.41786 -0.28911 -0.02814  0.31890  1.49007
>
> Coefficients:
>                       Estimate Std. Error t value Pr(>|t|)
> (Intercept)           5.340782   0.251218  21.260  < 2e-16 ***
> hit                   0.004479   0.001158   3.867 0.000133 ***
> rbi                   0.011102   0.002195   5.059 7.05e-07 ***
> walk                  0.005421   0.002206   2.457 0.014533 *
> obp                  -1.385584   0.824105  -1.681 0.093653 .
> strike.out           -0.005399   0.001438  -3.755 0.000205 ***
> free.agent.eligible1  1.611521   0.080657  19.980  < 2e-16 ***
> free.agent.19911     -0.301243   0.103481  -2.911 0.003848 **
> arbitr.elgible.1      1.293059   0.086696  14.915  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.5351 on 328 degrees of freedom
> Multiple R-Squared: 0.7981,     Adjusted R-squared: 0.7932
> F-statistic: 162.1 on 8 and 328 DF,  p-value: < 2.2e-16
>
>
--------------------------------------------------------------------------
--
> ----------------------------------------------------
>
>
> --------------with
>
interactions----------------------------------------------------------------
> ---------------------------
>
> >
> > summary(baseball.lgmodel_with_interactions_ALL_arbid)
>
> Call:
> lm(formula = log(salary) ~ hit + rbi + strike.out +
free.agent.eligible +
>     free.agent.1991 + arbitr.elgible. + hit * free.agent.1991 +
>     hit * arbitr.elgible. + hit * rbi + rbi * free.agent.eligible +
>     rbi * arbitr.elgible. + rbi * arbitr.1991 + hit * strike.out +
>     strike.out * free.agent.eligible + strike.out * arbitr.elgible. +
>     strike.out * run + strike.out * hr + hit * free.agent.eligible +
>     free.agent.eligible * run + hit * free.agent.1991 + strike.out *
>     free.agent.1991 + free.agent.1991 * batting + free.agent.1991 *
>     obp + arbitr.elgible. * run + batting * double + obp * run +
>     obp * hr + walk * stolen.base + hit * arbitr.1991 +
free.agent.eligible
> *
>     double + arbitr.elgible. * double + strike.out * triple +
>     triple * batting + triple * walk + triple * walk + hit *
>     hr + rbi * hr + free.agent.eligible * hr + free.agent.1991 *
>     hr + arbitr.elgible. * hr + hr * arbitr.1991 + hit * walk +
>     free.agent.eligible * walk + walk * rbi + rbi * stolen.base +
>     strike.out * stolen.base + stolen.base * batting + stolen.base *
>     walk + stolen.base * rbi + stolen.base * walk + arbitr.elgible. *
>     error)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.29352 -0.28287 -0.03748  0.29790  1.31590
>
> Coefficients:
>                                   Estimate Std. Error t value Pr(>|t|)
> (Intercept)                      5.217e+00  3.467e-01  15.048  < 2e-16
***
> hit                              6.927e-03  6.226e-03   1.112 0.266889
> rbi                              1.908e-02  1.150e-02   1.658 0.098350
.
> strike.out                      -5.692e-03  4.586e-03  -1.241 0.215517
> free.agent.eligible1             1.287e+00  2.259e-01   5.699 3.05e-08
***
> free.agent.19911                 3.828e-01  6.575e-01   0.582 0.560914
> arbitr.elgible.1                 1.038e+00  2.195e-01   4.726 3.63e-06
***
> arbitr.19911                    -1.024e+00  4.392e-01  -2.331 0.020443
*
> run                              4.932e-02  2.905e-02   1.698 0.090682
.
> hr                              -1.093e-01  7.208e-02  -1.516 0.130543
> batting                         -1.814e-01  2.558e+00  -0.071 0.943522
> obp                             -1.375e+00  2.253e+00  -0.610 0.542099
> double                          -5.259e-02  4.489e-02  -1.172 0.242349
> walk                             1.395e-02  9.757e-03   1.430 0.153808
> stolen.base                     -1.685e-02  4.299e-02  -0.392 0.695372
> triple                          -1.367e-01  1.600e-01  -0.854 0.393807
> error                           -4.097e-03  6.879e-03  -0.595 0.552007
> hit:free.agent.19911             8.248e-04  4.611e-03   0.179 0.858174
> hit:arbitr.elgible.1             4.873e-03  6.448e-03   0.756 0.450395
> hit:rbi                         -1.382e-04  7.709e-05  -1.792 0.074184
.
> rbi:free.agent.eligible1         5.352e-03  9.555e-03   0.560 0.575855
> rbi:arbitr.elgible.1            -3.384e-03  1.136e-02  -0.298 0.766072
> rbi:arbitr.19911                 3.596e-02  2.179e-02   1.650 0.100046
> hit:strike.out                   5.480e-06  5.446e-05   0.101 0.919917
> strike.out:free.agent.eligible1 -2.570e-03  4.282e-03  -0.600 0.548890
> strike.out:arbitr.elgible.1     -9.703e-04  5.234e-03  -0.185 0.853068
> strike.out:run                   1.685e-04  1.246e-04   1.352 0.177345
> strike.out:hr                   -3.088e-04  2.277e-04  -1.356 0.176229
> hit:free.agent.eligible1        -1.359e-03  6.224e-03  -0.218 0.827363
> free.agent.eligible1:run         1.248e-02  9.109e-03   1.370 0.171917
> strike.out:free.agent.19911     -1.851e-02  5.974e-03  -3.099 0.002140
**
> free.agent.19911:batting         7.076e-01  6.200e+00   0.114 0.909215
> free.agent.19911:obp            -1.421e+00  3.952e+00  -0.360 0.719394
> arbitr.elgible.1:run            -8.541e-03  8.773e-03  -0.974 0.331100
> batting:double                   2.346e-01  1.609e-01   1.458 0.145884
> run:obp                         -1.825e-01  7.492e-02  -2.436 0.015462
*
> hr:obp                           3.687e-01  2.116e-01   1.742 0.082608
.
> walk:stolen.base                -6.789e-05  1.557e-04  -0.436 0.663083
> hit:arbitr.19911                -5.835e-03  7.084e-03  -0.824 0.410808
> free.agent.eligible1:double     -1.151e-02  1.663e-02  -0.692 0.489334
> arbitr.elgible.1:double          2.169e-03  1.938e-02   0.112 0.910985
> strike.out:triple               -8.106e-04  6.023e-04  -1.346 0.179475
> batting:triple                   5.179e-01  5.599e-01   0.925 0.355841
> walk:triple                      8.755e-04  9.262e-04   0.945 0.345349
> hit:hr                          -3.320e-04  2.626e-04  -1.264 0.207180
> rbi:hr                           4.748e-04  3.015e-04   1.575 0.116414
> free.agent.eligible1:hr          1.840e-02  2.313e-02   0.796 0.426972
> free.agent.19911:hr              7.216e-02  1.889e-02   3.819 0.000165
***
> arbitr.elgible.1:hr              4.111e-02  2.803e-02   1.467 0.143564
> arbitr.19911:hr                 -2.368e-02  4.647e-02  -0.510 0.610723
> hit:walk                         3.173e-05  7.826e-05   0.405 0.685442
> free.agent.eligible1:walk       -5.423e-03  4.984e-03  -1.088 0.277472
> rbi:walk                        -7.569e-05  1.313e-04  -0.577 0.564598
> rbi:stolen.base                  3.980e-05  1.605e-04   0.248 0.804409
> strike.out:stolen.base          -2.611e-04  1.615e-04  -1.617 0.107004
> batting:stolen.base              1.552e-01  1.434e-01   1.082 0.280020
> arbitr.elgible.1:error           3.930e-03  1.390e-02   0.283 0.777495
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> Residual standard error: 0.4925 on 280 degrees of freedom
> Multiple R-Squared: 0.854,      Adjusted R-squared: 0.8248
> F-statistic: 29.24 on 56 and 280 DF,  p-value: < 2.2e-16
>

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list