[R] Collinearity in Moderated Multiple Regression
Michael Friendly
friendly at yorku.ca
Wed Aug 4 15:07:15 CEST 2010
haenlein at gmail.com wrote:
> I'm sorry -- I think I chose a bad example. Let me start over again:
>
> I want to estimate a moderated regression model of the following form:
> y = a*x1 + b*x2 + c*x1*x2 + e
>
> Based on my understanding, including an interaction term (x1*x2) into the
> regression in addition to x1 and x2 leads to issues of multicollinearity,
> as x1*x2 is likely to covary to some degree with x1 (and x2). One
> recommendation I have seen in this context is to use mean centering, but
> apparently this does not solve the problem (see: Echambadi, Raj and James
> D. Hess (2007), "Mean-centering does not alleviate collinearity problems in
> moderated multiple regression models," Marketing science, 26 (3), 438 -
> 45). So my question is: Which R function can I use to estimate this type of
> model.
I haven't read that article, but there are many others that demonstrate
that, in cases of *structural* collinearity (polynomial models, models
with interaction terms), mean centering of the Xs, particularly in the
product terms *does* reduce the impact of collinearity to usually
acceptable levels.
Plus, if you mean center the Xs themselves, their coefficients have more
sensible interpretations (slope for X1 at the mean, rather than over the
origin).
-Michael
>
>
>
>
>
> On Aug 3, 2010 3:42pm, David Winsemius <dwinsemius at comcast.net> wrote:
>> I think you are attributing to "collinearity" a problem that is due to
>> your small sample size. You are predicting 9 points with 3 predictor
>> terms, and incorrectly concluding that there is some "inconsistency"
>> because you get an R^2 that is above some number you deem surprising. (I
>> got values between 0.2 and 0.4 on several runs.
>
>
>
>> Try:
>
>> x1
>> x2
>> x3
>
>
>> y
>> model
>> summary(model)
>
>
>
>> # Multiple R-squared: 0.04269
>
>
>
>> --
>
>> David.
>
>
>
>> On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:
>
>
>
>
>> Dear all,
>
>
>
>> I have one dependent variable y and two independent variables x1 and x2
>
>> which I would like to use to explain y. x1 and x2 are design factors in an
>
>> experiment and are not correlated with each other. For example assume
>> that:
>
>
>
>> x1
>> x2
>> cor(x1,x2)
>
>
>
>> The problem is that I do not only want to analyze the effect of x1 and x2
>> on
>
>> y but also of their interaction x1*x2. Evidently this interaction term
>> has a
>
>> substantial correlation with both x1 and x2:
>
>
>
>> x3
>> cor(x1,x3)
>
>> cor(x2,x3)
>
>
>
>> I therefore expect that a simple regression of y on x1, x2 and x1*x2 will
>
>> lead to biased results due to multicollinearity. For example, even when y
>> is
>
>> completely random and unrelated to x1 and x2, I obtain a substantial R2
>> for
>
>> a simple linear model which includes all three variables. This evidently
>
>> does not make sense:
>
>
>
>> y
>> model
>> summary(model)
>
>
>
>> Is there some function within R or in some separate library that allows me
>
>> to estimate such a regression without obtaining inconsistent results?
>
>
>
>> Thanks for your help in advance,
>
>
>
>> Michael
>
>
>
>
>
>> Michael Haenlein
>
>> Associate Professor of Marketing
>
>> ESCP Europe
>
>> Paris, France
>
>
>
>> [[alternative HTML version deleted]]
>
>
>
>> ______________________________________________
>
>> R-help at r-project.org mailing list
>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>> David Winsemius, MD
>
>> West Hartford, CT
>
>
>
>
> [[alternative HTML version deleted]]
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list