[Rd] lm() gives different results to lm.ridge() and SPSS
Nick Brown
nick.brown at free.fr
Sat May 6 01:49:18 CEST 2017
Hi John,
Thanks for the comment... but that appears to mean that SPSS has a big problem. I have always been told that to include an interaction term in a regression, the only way is to do the multiplication by hand. But then it seems to be impossible to stop SPSS from re-standardizing the variable that corresponds to the interaction term. Am I missing something? Is there a way to perform the regression with the interaction in SPSS without computing the interaction as a separate variable?
Best,
Nick
----- Original Message -----
From: "John Fox" <jfox at mcmaster.ca>
To: "Nick Brown" <nick.brown at free.fr>, "peter dalgaard" <pdalgd at gmail.com>
Cc: r-devel at r-project.org
Sent: Friday, 5 May, 2017 8:22:53 PM
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
Dear Nick,
On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown"
<r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:
>>I conjecture that something in the vicinity of
>> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>>summary(res)
>> would reproduce the SPSS Beta values.
>
>Yes, that works. Thanks!
That you have to work hard in R to match the SPSS results isn’t such a bad
thing when you factor in the observation that standardizing the
interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of
its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense.
Best,
John
-------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/
>
>
>----- Original Message -----
>
>From: "peter dalgaard" <pdalgd at gmail.com>
>To: "Viechtbauer Wolfgang (SP)"
><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown"
><nick.brown at free.fr>
>Cc: r-devel at r-project.org
>Sent: Friday, 5 May, 2017 3:33:29 PM
>Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>
>Thanks, I was getting to try this, but got side tracked by actual work...
>
>Your analysis reproduces the SPSS unscaled estimates. It still remains to
>figure out how Nick got
>
>>
>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>
>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>0.07342198 -0.39650356 -0.36569488 -0.09435788
>
>
>which does not match your output. I suspect that ZMEAN_PA and
>ZDIVERSITY_PA were scaled for this analysis (but the interaction term
>still obviously is not). I conjecture that something in the vicinity of
>
>res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>summary(res)
>
>would reproduce the SPSS Beta values.
>
>
>> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP)
>><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>>
>> I had no problems running regression models in SPSS and R that yielded
>>the same results for these data.
>>
>> The difference you are observing is from fitting different models. In
>>R, you fitted:
>>
>> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
>> summary(res)
>>
>> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
>>is not a standardized variable itself and not the same as "ZINTER_PA_C"
>>in the png you showed, which is not a variable in the dataset, but can
>>be created with:
>>
>> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
>>
>> If you want the same results as in SPSS, then you need to fit:
>>
>> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C,
>>data=dat)
>> summary(res)
>>
>> This yields:
>>
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 6.41041 0.01722 372.21 <2e-16 ***
>> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 ***
>> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 ***
>> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 ***
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Exactly the same as in the png.
>>
>> Peter already mentioned this as a possible reason for the discrepancy:
>>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it
>>perhaps the case that x1 and x2 have already been scaled to have
>>standard deviation 1? In that case, x1*x2 won't be.")
>>
>> Best,
>> Wolfgang
>>
>> -----Original Message-----
>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick
>>Brown
>> Sent: Friday, May 05, 2017 10:40
>> To: peter dalgaard
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>>
>> Hi,
>>
>> Here is (I hope) all the relevant output from R.
>>
>>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 >
>>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA,
>>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA *
>>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA
>>>ZMEAN_PA:ZDIVERSITY_PA
>> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the
>>problem originally. :-)
>>
>>
>>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>> 0.07342198 -0.39650356 -0.36569488 -0.09435788 >
>>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS
>>is attached. The unstandardized coefficients in SPSS look nothing like
>>those in R. The standardized coefficients in SPSS match the
>>lm.ridge()$coef numbers very closely indeed, suggesting that the same
>>algorithm may be in use.
>>
>> I have put the dataset file, which is the untouched original I received
>>from the authors, in this Dropbox folder:
>>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0
>>. You can read it into R with this code (one variable needs to be
>>standardized and centered; everything else is already in the file):
>>
>> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE)
>>s1$ZDEPRESSION <- scale(s1$DEPRESSION)
>> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm
>>sure IBM will want to fix it quickly (ha ha ha).
>>
>> Nick
>>
>> ----- Original Message -----
>>
>> From: "peter dalgaard" <pdalgd at gmail.com>
>> To: "Nick Brown" <nick.brown at free.fr>
>> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org
>> Sent: Friday, 5 May, 2017 10:02:10 AM
>> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>>
>> I asked you before, but in case you missed it: Are you looking at the
>>right place in SPSS output?
>>
>> The UNstandardized coefficients should be comparable to R, i.e. the "B"
>>column, not "Beta".
>>
>> -pd
>>
>>> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote:
>>>
>>> Hi Simon,
>>>
>>> Yes, if I uses coefficients() I get the same results for lm() and
>>>lm.ridge(). So that's consistent, at least.
>>>
>>> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees
>>>with the value from SPSS to 5dp, which is an interesting coincidence if
>>>these numbers have no particular external meaning in lm.ridge().
>>>
>>> Kind regards,
>>> Nick
>>>
>>> ----- Original Message -----
>>>
>>> From: "Simon Bonner" <sbonner6 at uwo.ca>
>>> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org
>>> Sent: Thursday, 4 May, 2017 7:07:33 PM
>>> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS
>>>
>>> Hi Nick,
>>>
>>> I think that the problem here is your use of $coef to extract the
>>>coefficients of the ridge regression. The help for lm.ridge states that
>>>coef is a "matrix of coefficients, one row for each value of lambda.
>>>Note that these are not on the original scale and are for use by the
>>>coef method."
>>>
>>> I ran a small test with simulated data, code is copied below, and
>>>indeed the output from lm.ridge differs depending on whether the
>>>coefficients are accessed via $coef or via the coefficients() function.
>>>The latter does produce results that match the output from lm.
>>>
>>> I hope that helps.
>>>
>>> Cheers,
>>>
>>> Simon
>>>
>>> ## Load packages
>>> library(MASS)
>>>
>>> ## Set seed
>>> set.seed(8888)
>>>
>>> ## Set parameters
>>> n <- 100
>>> beta <- c(1,0,1)
>>> sigma <- .5
>>> rho <- .75
>>>
>>> ## Simulate correlated covariates
>>> Sigma <- matrix(c(1,rho,rho,1),ncol=2)
>>> X <- mvrnorm(n,c(0,0),Sigma=Sigma)
>>>
>>> ## Simulate data
>>> mu <- beta[1] + X %*% beta[-1]
>>> y <- rnorm(n,mu,sigma)
>>>
>>> ## Fit model with lm()
>>> fit1 <- lm(y ~ X)
>>>
>>> ## Fit model with lm.ridge()
>>> fit2 <- lm.ridge(y ~ X)
>>>
>>> ## Compare coefficients
>>> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))
>>>
>>> [,1] [,2] [,3]
>>> (Intercept) 0.99276001 NA 0.99276001
>>> X1 -0.03980772 -0.04282391 -0.03980772
>>> X2 1.11167179 1.06200476 1.11167179
>>>
>>> --
>>>
>>> Simon Bonner
>>> Assistant Professor of Environmetrics/ Director MMASc
>>> Department of Statistical and Actuarial Sciences/Department of Biology
>>> University of Western Ontario
>>>
>>> Office: Western Science Centre rm 276
>>>
>>> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax:
>>>519-661-3813
>>> Twitter: @bonnerstatslab | Website:
>>>http://simon.bonners.ca/bonner-lab/wpblog/
>>>
>>>> -----Original Message-----
>>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
>>>>Nick
>>>> Brown
>>>> Sent: May 4, 2017 10:29 AM
>>>> To: r-devel at r-project.org
>>>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
>>>>
>>>> Hallo,
>>>>
>>>> I hope I am posting to the right place. I was advised to try this
>>>>list by Ben Bolker
>>>> (https://twitter.com/bolkerb/status/859909918446497795). I also
>>>>posted this
>>>> question to StackOverflow
>>>>
>>>>(http://stackoverflow.com/questions/43771269/lm-gives-different-results
>>>>-
>>>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my
>>>>first
>>>> program in 1975 and have been paid to program in about 15 different
>>>> languages, so I have some general background knowledge.
>>>>
>>>> I have a regression from which I extract the coefficients like this:
>>>> lm(y ~ x1 * x2, data=ds)$coef
>>>> That gives: x1=0.40, x2=0.37, x1*x2=0.09
>>>>
>>>> When I do the same regression in SPSS, I get:
>>>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
>>>> So the main effects are in agreement, but there is quite a difference
>>>>in the
>>>> coefficient for the interaction.
>>>>
>>>> X1 and X2 are correlated about .75 (yes, yes, I know - this model
>>>>wasn't my
>>>> idea, but it got published), so there is quite possibly something
>>>>going on with
>>>> collinearity. So I thought I'd try lm.ridge() to see if I can get an
>>>>idea of where
>>>> the problems are occurring.
>>>>
>>>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
>>>>penalty) and
>>>> check we get the same results as with lm():
>>>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
>>>> x1=0.40, x2=0.37, x1*x2=0.14
>>>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course,
>>>>lambda=0 is the
>>>> default, so it can be omitted; I can alternate between including or
>>>>deleting
>>>> ".ridge" in the function call, and watch the coefficient for the
>>>>interaction
>>>> change.)
>>>>
>>>> What seems slightly strange to me here is that I assumed that
>>>>lm.ridge() just
>>>> piggybacks on lm() anyway, so in the specific case where lambda=0 and
>>>>there
>>>> is no "ridging" to do, I'd expect exactly the same results.
>>>>
>>>> Unfortunately there are 34,000 cases in the dataset, so a "minimal"
>>>>reprex will
>>>> not be easy to make, but I can share the data via Dropbox or
>>>>something if that
>>>> would help.
>>>>
>>>> I appreciate that when there is strong collinearity then all bets are
>>>>off in terms
>>>> of what the betas mean, but I would really expect lm() and lm.ridge()
>>>>to give
>>>> the same results. (I would be happy to ignore SPSS, but for the
>>>>moment it's
>>>> part of the majority!)
>>>>
>>>> Thanks for reading,
>>>> Nick
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>--
>Peter Dalgaard, Professor,
>Center for Statistics, Copenhagen Business School
>Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>Phone: (+45)38153501
>Office: A 4.23
>Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
More information about the R-devel
mailing list