[Rd] lm() gives different results to lm.ridge() and SPSS
peter dalgaard
pdalgd at gmail.com
Sat May 6 10:57:59 CEST 2017
> On 6 May 2017, at 01:49 , Nick Brown <nick.brown at free.fr> wrote:
>
> Hi John,
>
> Thanks for the comment... but that appears to mean that SPSS has a big problem. I have always been told that to include an interaction term in a regression, the only way is to do the multiplication by hand. But then it seems to be impossible to stop SPSS from re-standardizing the variable that corresponds to the interaction term. Am I missing something? Is there a way to perform the regression with the interaction in SPSS without computing the interaction as a separate variable?
Just look at the unstandardized coefficients in SPSS. The standardized ones are of some usefulness, but it is limited in the case of syntesized regressors like product terms. I imagine that the interpretation also goes whacky for squared terms, dummy coded groupings, etc.
(Does SPSS really still not have an automated way of generating interaction terms? 1977 called.... googling.... Looks like GLM understands them, REGRESS not.)
-pd
>
> Best,
> Nick
>
> From: "John Fox" <jfox at mcmaster.ca>
> To: "Nick Brown" <nick.brown at free.fr>, "peter dalgaard" <pdalgd at gmail.com>
> Cc: r-devel at r-project.org
> Sent: Friday, 5 May, 2017 8:22:53 PM
> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>
> Dear Nick,
>
>
> On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown"
> <r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:
>
> >>I conjecture that something in the vicinity of
> >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
> >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
> >>summary(res)
> >> would reproduce the SPSS Beta values.
> >
> >Yes, that works. Thanks!
>
> That you have to work hard in R to match the SPSS results isn’t such a bad
> thing when you factor in the observation that standardizing the
> interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of
> its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense.
>
> Best,
> John
>
> -------------------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> Web: http://socserv.mcmaster.ca/jfox/
>
>
> >
> >
> >----- Original Message -----
> >
> >From: "peter dalgaard" <pdalgd at gmail.com>
> >To: "Viechtbauer Wolfgang (SP)"
> ><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown"
> ><nick.brown at free.fr>
> >Cc: r-devel at r-project.org
> >Sent: Friday, 5 May, 2017 3:33:29 PM
> >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
> >
> >Thanks, I was getting to try this, but got side tracked by actual work...
> >
> >Your analysis reproduces the SPSS unscaled estimates. It still remains to
> >figure out how Nick got
> >
> >>
> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
> >
> >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
> >0.07342198 -0.39650356 -0.36569488 -0.09435788
> >
> >
> >which does not match your output. I suspect that ZMEAN_PA and
> >ZDIVERSITY_PA were scaled for this analysis (but the interaction term
> >still obviously is not). I conjecture that something in the vicinity of
> >
> >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
> >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
> >summary(res)
> >
> >would reproduce the SPSS Beta values.
> >
> >
> >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP)
> >><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> >>
> >> I had no problems running regression models in SPSS and R that yielded
> >>the same results for these data.
> >>
> >> The difference you are observing is from fitting different models. In
> >>R, you fitted:
> >>
> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
> >> summary(res)
> >>
> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
> >>is not a standardized variable itself and not the same as "ZINTER_PA_C"
> >>in the png you showed, which is not a variable in the dataset, but can
> >>be created with:
> >>
> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
> >>
> >> If you want the same results as in SPSS, then you need to fit:
> >>
> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C,
> >>data=dat)
> >> summary(res)
> >>
> >> This yields:
> >>
> >> Coefficients:
> >> Estimate Std. Error t value Pr(>|t|)
> >> (Intercept) 6.41041 0.01722 372.21 <2e-16 ***
> >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 ***
> >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 ***
> >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 ***
> >> ---
> >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >> Exactly the same as in the png.
> >>
> >> Peter already mentioned this as a possible reason for the discrepancy:
> >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it
> >>perhaps the case that x1 and x2 have already been scaled to have
> >>standard deviation 1? In that case, x1*x2 won't be.")
> >>
> >> Best,
> >> Wolfgang
> >>
> >> -----Original Message-----
> >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick
> >>Brown
> >> Sent: Friday, May 05, 2017 10:40
> >> To: peter dalgaard
> >> Cc: r-devel at r-project.org
> >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
> >>
> >> Hi,
> >>
> >> Here is (I hope) all the relevant output from R.
> >>
> >>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 >
> >>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA,
> >>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA *
> >>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA
> >>>ZMEAN_PA:ZDIVERSITY_PA
> >> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the
> >>problem originally. :-)
> >>
> >>
> >>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
> >>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
> >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 >
> >>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
> >>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
> >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS
> >>is attached. The unstandardized coefficients in SPSS look nothing like
> >>those in R. The standardized coefficients in SPSS match the
> >>lm.ridge()$coef numbers very closely indeed, suggesting that the same
> >>algorithm may be in use.
> >>
> >> I have put the dataset file, which is the untouched original I received
> >>from the authors, in this Dropbox folder:
> >>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0
> >>. You can read it into R with this code (one variable needs to be
> >>standardized and centered; everything else is already in the file):
> >>
> >> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE)
> >>s1$ZDEPRESSION <- scale(s1$DEPRESSION)
> >> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm
> >>sure IBM will want to fix it quickly (ha ha ha).
> >>
> >> Nick
> >>
> >> ----- Original Message -----
> >>
> >> From: "peter dalgaard" <pdalgd at gmail.com>
> >> To: "Nick Brown" <nick.brown at free.fr>
> >> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org
> >> Sent: Friday, 5 May, 2017 10:02:10 AM
> >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
> >>
> >> I asked you before, but in case you missed it: Are you looking at the
> >>right place in SPSS output?
> >>
> >> The UNstandardized coefficients should be comparable to R, i.e. the "B"
> >>column, not "Beta".
> >>
> >> -pd
> >>
> >>> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote:
> >>>
> >>> Hi Simon,
> >>>
> >>> Yes, if I uses coefficients() I get the same results for lm() and
> >>>lm.ridge(). So that's consistent, at least.
> >>>
> >>> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees
> >>>with the value from SPSS to 5dp, which is an interesting coincidence if
> >>>these numbers have no particular external meaning in lm.ridge().
> >>>
> >>> Kind regards,
> >>> Nick
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Simon Bonner" <sbonner6 at uwo.ca>
> >>> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org
> >>> Sent: Thursday, 4 May, 2017 7:07:33 PM
> >>> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS
> >>>
> >>> Hi Nick,
> >>>
> >>> I think that the problem here is your use of $coef to extract the
> >>>coefficients of the ridge regression. The help for lm.ridge states that
> >>>coef is a "matrix of coefficients, one row for each value of lambda.
> >>>Note that these are not on the original scale and are for use by the
> >>>coef method."
> >>>
> >>> I ran a small test with simulated data, code is copied below, and
> >>>indeed the output from lm.ridge differs depending on whether the
> >>>coefficients are accessed via $coef or via the coefficients() function.
> >>>The latter does produce results that match the output from lm.
> >>>
> >>> I hope that helps.
> >>>
> >>> Cheers,
> >>>
> >>> Simon
> >>>
> >>> ## Load packages
> >>> library(MASS)
> >>>
> >>> ## Set seed
> >>> set.seed(8888)
> >>>
> >>> ## Set parameters
> >>> n <- 100
> >>> beta <- c(1,0,1)
> >>> sigma <- .5
> >>> rho <- .75
> >>>
> >>> ## Simulate correlated covariates
> >>> Sigma <- matrix(c(1,rho,rho,1),ncol=2)
> >>> X <- mvrnorm(n,c(0,0),Sigma=Sigma)
> >>>
> >>> ## Simulate data
> >>> mu <- beta[1] + X %*% beta[-1]
> >>> y <- rnorm(n,mu,sigma)
> >>>
> >>> ## Fit model with lm()
> >>> fit1 <- lm(y ~ X)
> >>>
> >>> ## Fit model with lm.ridge()
> >>> fit2 <- lm.ridge(y ~ X)
> >>>
> >>> ## Compare coefficients
> >>> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))
> >>>
> >>> [,1] [,2] [,3]
> >>> (Intercept) 0.99276001 NA 0.99276001
> >>> X1 -0.03980772 -0.04282391 -0.03980772
> >>> X2 1.11167179 1.06200476 1.11167179
> >>>
> >>> --
> >>>
> >>> Simon Bonner
> >>> Assistant Professor of Environmetrics/ Director MMASc
> >>> Department of Statistical and Actuarial Sciences/Department of Biology
> >>> University of Western Ontario
> >>>
> >>> Office: Western Science Centre rm 276
> >>>
> >>> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax:
> >>>519-661-3813
> >>> Twitter: @bonnerstatslab | Website:
> >>>http://simon.bonners.ca/bonner-lab/wpblog/
> >>>
> >>>> -----Original Message-----
> >>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
> >>>>Nick
> >>>> Brown
> >>>> Sent: May 4, 2017 10:29 AM
> >>>> To: r-devel at r-project.org
> >>>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
> >>>>
> >>>> Hallo,
> >>>>
> >>>> I hope I am posting to the right place. I was advised to try this
> >>>>list by Ben Bolker
> >>>> (https://twitter.com/bolkerb/status/859909918446497795). I also
> >>>>posted this
> >>>> question to StackOverflow
> >>>>
> >>>>(http://stackoverflow.com/questions/43771269/lm-gives-different-results
> >>>>-
> >>>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my
> >>>>first
> >>>> program in 1975 and have been paid to program in about 15 different
> >>>> languages, so I have some general background knowledge.
> >>>>
> >>>> I have a regression from which I extract the coefficients like this:
> >>>> lm(y ~ x1 * x2, data=ds)$coef
> >>>> That gives: x1=0.40, x2=0.37, x1*x2=0.09
> >>>>
> >>>> When I do the same regression in SPSS, I get:
> >>>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
> >>>> So the main effects are in agreement, but there is quite a difference
> >>>>in the
> >>>> coefficient for the interaction.
> >>>>
> >>>> X1 and X2 are correlated about .75 (yes, yes, I know - this model
> >>>>wasn't my
> >>>> idea, but it got published), so there is quite possibly something
> >>>>going on with
> >>>> collinearity. So I thought I'd try lm.ridge() to see if I can get an
> >>>>idea of where
> >>>> the problems are occurring.
> >>>>
> >>>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
> >>>>penalty) and
> >>>> check we get the same results as with lm():
> >>>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
> >>>> x1=0.40, x2=0.37, x1*x2=0.14
> >>>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course,
> >>>>lambda=0 is the
> >>>> default, so it can be omitted; I can alternate between including or
> >>>>deleting
> >>>> ".ridge" in the function call, and watch the coefficient for the
> >>>>interaction
> >>>> change.)
> >>>>
> >>>> What seems slightly strange to me here is that I assumed that
> >>>>lm.ridge() just
> >>>> piggybacks on lm() anyway, so in the specific case where lambda=0 and
> >>>>there
> >>>> is no "ridging" to do, I'd expect exactly the same results.
> >>>>
> >>>> Unfortunately there are 34,000 cases in the dataset, so a "minimal"
> >>>>reprex will
> >>>> not be easy to make, but I can share the data via Dropbox or
> >>>>something if that
> >>>> would help.
> >>>>
> >>>> I appreciate that when there is strong collinearity then all bets are
> >>>>off in terms
> >>>> of what the betas mean, but I would really expect lm() and lm.ridge()
> >>>>to give
> >>>> the same results. (I would be happy to ignore SPSS, but for the
> >>>>moment it's
> >>>> part of the majority!)
> >>>>
> >>>> Thanks for reading,
> >>>> Nick
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >--
> >Peter Dalgaard, Professor,
> >Center for Statistics, Copenhagen Business School
> >Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >Phone: (+45)38153501
> >Office: A 4.23
> >Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-devel at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list