[R] Linear multivariate regression with Robust error

Daniel Malter daniel at umd.edu
Fri Jun 10 20:26:20 CEST 2011


I am with Michael. It is almost impossible to figure out what you are trying.
However, I assume, like Michael, that you regress y on x2 and find, say, a
negative effect. But when you regress y on x1 and x2, then you find a
positive effect of x2. The short answer to your question is that in this
case your restricted model (the one only containing x2) suffers from omitted
variable bias. Here is an example:

Let's assume you are interested in the effect of x2 in this example! Let's
say we have 100 observations and that y depends on x1 and x2. Furthermore,
let us assume that x1 and x2 are positively correlated. 

x1=rnorm(100)
e1=rnorm(100) #random error term

x2=x1+rnorm(100) #x2 is correlated with x2

e=rnorm(100) #random error term

y=-3*x1+x2+e #dependent variable



Note that x1 has a negative relationship to y, but x2 has a positive
relationship to y. Note also that the effect of x1 on y is larger in size
(minus 3) than the effect of x2 on y (positive 1). Now let's run some
regressions.

First, let's run y on x1 only. An unbiased estimate should reproduce the
coefficient of -3 within the confidence interval. However, the estimated x1
is much smaller than we would expect. The reason is that because we omit x2,
x1 picks up some of the effect of x2 because x1 and x2 are correlated. Hence
the coefficient for x1 is diluted.

reg1<-lm(y~x1)
summary(reg1)


Now, let's run y on x2. An unbiased estimate should reproduce the
coefficient of 1 within the confidence interval. However, the estimated
effect of x2 is negative and significant. Obviously, the estimate for x2 is
severely biased. The reasons are the following. First, x2 correlates with
x1. Hence, when you regress y only on x2, the coefficient will pickup some
of the effect of x1 on y. This will generally lead to biased estimates of
the coefficient for x2. The reason why the coefficient has the opposite sign
that it is supposed to have (and why it is not just a little bit biased like
the coefficient on x1 in the previous regression) is that 1. x1 and x2
correlated positively, 2. x1 has a negative effect, while x2 has a positive
effect on y (opposite signs), and 3. the effect of x1 is much larger in size
than the effect of x2.

reg2<-lm(y~x2)
summary(reg2)


Hence, if we accounted for x1 and x2 in our regression of y, both
coefficients should be consistently estimated because then we do not suffer
from the omission of important predictors of y that are correlated among
each other.

reg3<-lm(y~x1+x2)
summary(reg3)

Taahtaah. Problem solved (most likely). So the answer to your question is
that the "correct" coefficient is likely the one in which you include the
other control variables. You should read up on "omitted variable bias." If
that is not the problem, you have to give us more information/reproducible
code.

Hope that helps,
Daniel

--
View this message in context: http://r.789695.n4.nabble.com/Linear-multivariate-regression-with-Robust-error-tp3587531p3589083.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list