[R] Collinearity in Moderated Multiple Regression
Liaw, Andy
andy_liaw at merck.com
Tue Aug 3 17:04:12 CEST 2010
If the collinearity you're seeing arose from the addition of a product
(interaction) term, I do not think penalization is the best answer.
What is the goal of your analysis? If it's prediction, then I wouldn't
worry about this type of collinearity. If you're interested in
inference, I'd try some transformation to reduce (but not necessarily
eliminate) the effect of collinearity. Mean centering is the simplest,
but not the only thing you can do.
Just my $0.02...
Andy
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Michael Haenlein
Sent: Tuesday, August 03, 2010 10:44 AM
To: 'Nikhil Kaza'
Cc: r-help at r-project.org
Subject: Re: [R] Collinearity in Moderated Multiple Regression
Thanks very much -- it seems that Ridge Regression can do what I'm
looking
for!
Best,
Michael
-----Original Message-----
From: Nikhil Kaza [mailto:nikhil.list at gmail.com]
Sent: Tuesday, August 03, 2010 16:21
To: haenlein at gmail.com
Cc: r-help at r-project.org (r-help at R-project.org)
Subject: Re: [R] Collinearity in Moderated Multiple Regression
My usual strategy of dealing with multicollinearity is to drop the
offending
variable or transform one them. I would also check vif functions in car
and
Design.
I think you are looking for lm.ridge in MASS package.
Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina
nikhil.list at gmail.com
On Aug 3, 2010, at 9:51 AM, haenlein at gmail.com wrote:
> I'm sorry -- I think I chose a bad example. Let me start over again:
>
> I want to estimate a moderated regression model of the following form:
> y = a*x1 + b*x2 + c*x1*x2 + e
>
> Based on my understanding, including an interaction term (x1*x2) into
> the regression in addition to x1 and x2 leads to issues of
> multicollinearity, as x1*x2 is likely to covary to some degree with x1
> (and x2). One recommendation I have seen in this context is to use
> mean centering, but apparently this does not solve the problem (see:
> Echambadi, Raj and James D. Hess (2007), "Mean-centering does not
> alleviate collinearity problems in moderated multiple regression
> models," Marketing science, 26 (3),
> 438 -
> 45). So my question is: Which R function can I use to estimate this
> type of model.
>
> Sorry for the confusion caused due to my previous message,
>
> Michael
>
>
>
>
>
>
> On Aug 3, 2010 3:42pm, David Winsemius <dwinsemius at comcast.net> wrote:
>> I think you are attributing to "collinearity" a problem that is due
>> to your small sample size. You are predicting 9 points with 3
>> predictor terms, and incorrectly concluding that there is some
>> "inconsistency"
>> because you get an R^2 that is above some number you deem surprising.
>> (I got values between 0.2 and 0.4 on several runs.
>
>
>
>> Try:
>
>> x1
>> x2
>> x3
>
>
>> y
>> model
>> summary(model)
>
>
>
>> # Multiple R-squared: 0.04269
>
>
>
>> --
>
>> David.
>
>
>
>> On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:
>
>
>
>
>> Dear all,
>
>
>
>> I have one dependent variable y and two independent variables x1 and
>> x2
>
>> which I would like to use to explain y. x1 and x2 are design factors
>> in an
>
>> experiment and are not correlated with each other. For example assume
>> that:
>
>
>
>> x1
>> x2
>> cor(x1,x2)
>
>
>
>> The problem is that I do not only want to analyze the effect of x1
>> and x2 on
>
>> y but also of their interaction x1*x2. Evidently this interaction
>> term has a
>
>> substantial correlation with both x1 and x2:
>
>
>
>> x3
>> cor(x1,x3)
>
>> cor(x2,x3)
>
>
>
>> I therefore expect that a simple regression of y on x1, x2 and
>> x1*x2 will
>
>> lead to biased results due to multicollinearity. For example, even
>> when y is
>
>> completely random and unrelated to x1 and x2, I obtain a substantial
>> R2 for
>
>> a simple linear model which includes all three variables. This
>> evidently
>
>> does not make sense:
>
>
>
>> y
>> model
>> summary(model)
>
>
>
>> Is there some function within R or in some separate library that
>> allows me
>
>> to estimate such a regression without obtaining inconsistent results?
>
>
>
>> Thanks for your help in advance,
>
>
>
>> Michael
>
>
>
>
>
>> Michael Haenlein
>
>> Associate Professor of Marketing
>
>> ESCP Europe
>
>> Paris, France
>
>
>
>> [[alternative HTML version deleted]]
>
>
>
>> ______________________________________________
>
>> R-help at r-project.org mailing list
>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>> David Winsemius, MD
>
>> West Hartford, CT
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list