[R] dealing with multicollinearity
Manuel Gutierrez
manuel_gutierrez_lopez at yahoo.es
Mon Apr 11 12:22:55 CEST 2005
I have a linear model y~x1+x2 of some data where the
coefficient for
x1 is higher than I would have expected from theory
(0.7 vs 0.88)
I wondered whether this would be an artifact due to x1
and x2 being correlated despite that the variance
inflation factor is not too high (1.065):
I used perturbation analysis to evaluate collinearity
library(perturb)
P<-perturb(A,pvars=c("x1","x2"),prange=c(1,1))
> summary(P)
Perturb variables:
x1 normal(0,1)
x2 normal(0,1)
Impact of perturbations on coefficients:
mean s.d. min max
(Intercept) -26.067 0.270 -27.235 -25.481
x1 0.726 0.025 0.672 0.882
x2 0.060 0.011 0.037 0.082
I get a mean for x1 of 0.726 which is closer to what
is expected.
I am not an statistical expert so I'd like to know if
my evaluation of the effects of collinearity is
correct and in that case any solutions to obtain a
reliable linear model.
Thanks,
Manuel
Some more detailed information:
> A<-lm(y~x1+x2)
> summary(A)
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-4.221946 -0.484055 -0.004762 0.397508 2.542769
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -27.23472 0.27996 -97.282 < 2e-16 ***
x1 0.88202 0.02475 35.639 < 2e-16 ***
x2 0.08180 0.01239 6.604 2.53e-10 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1
Residual standard error: 0.823 on 241 degrees of
freedom
Multiple R-Squared: 0.8411, Adjusted R-squared: 0.8398
F-statistic: 637.8 on 2 and 241 DF, p-value: <
2.2e-16
> cor.test(x1,x2)
Pearson's product-moment correlation
data: x1 and x2
t = -3.9924, df = 242, p-value = 8.678e-05
alternative hypothesis: true correlation is not equal
to 0
95 percent confidence interval:
-0.3628424 -0.1269618
sample estimates:
cor
-0.248584
More information about the R-help
mailing list