# [R] FW: How to fit an linear model withou intercept

Eik Vettorazzi E.Vettorazzi at uke.uni-hamburg.de
Wed Aug 29 12:40:53 CEST 2007

```Hi Mark,
as last comment you may also take a look at
?summary.lm
where you will notice, that R reports two different R squares depending
on the presence or absence of an intercept term. For comparison issues
you should ensure that you use the same mathematical object.
Ripley reply for this answer) in Jan 2006, as you see in
http://tolstoy.newcastle.edu.au/R/help/06/01/18923.html
hth.

Leeds, Mark (IED) schrieb:
> Eik : Today,  I've been reading Myers text , "classical and modern regression with applications" to refresh my memory
> about regression because it's been a while since I looked at that material. The usbtraction of the means from
> Both sides of the equation causing the intercept to be zero now makes more sense because, in the simple regression
> case,
>
> b0 = y bar - b1 x bar and, by subtracting the means, y bar and x bar both become zero, so b0 = zero.
>
> If you have any other comments,  they are very appreciated and always invited but I think between what you showed and above,
> it's clearer now. I think I will go with  centering both  the left and right hand sides to force the zero intercepts, estimate
> each model with the intercept ( which will hopefully numerically estimate the intercept as very close to zero ) and then compare
> the RSquareds of the two models. If you still see this as a problem, let me know because I am totally open to listening to other
> people's brains , especially good ones like yours.
>
>
>
> -----Original Message-----
> From: Eik Vettorazzi [mailto:E.Vettorazzi at uke.uni-hamburg.de]
> Sent: Tuesday, August 28, 2007 8:33 AM
> To: Leeds, Mark (IED)
> Cc: R-help
> Subject: Re: FW: [R] How to fit an linear model withou intercept
>
> Hi Mark,
> I don't know wether you recived a sufficient reply or not, so here are my comments to your problem.
> Supressing the constant term in a regression model will probably lead to a violation of the classical assumptions for this model.
>  From the OLS normal equations (in matrix notation)
>  (1)      (X'X)b=X'y
> and the definition of the OLS residuals
>  (2)      e = y-Xb
> you get - by substituting y form (2) in (1)
>        (X'X)b=(X'X)b+X'e
> and hence
>        X'e =0.
> Without a constant term you cannot assure, that the ols residuals
> e=(y-Xb) will have zero mean, wich holds when involving a constant term, since the first equation of X'e = 0 gives in this case sum(e)=0.
>
> For decomposing the TSS (y'y) into ESS (b'X'Xb) and RSS (e'e), which is needed to compute R², you will need X'e=0, because then the cross-product term b'X'e vanishes.
> Correct me if I'm wrong.
>
> Leeds, Mark (IED) schrieb:
>
>> Park, Eik : Could you start from the bottom and read this when you
>> have time. I really appreciate it.
>>
>> Basically, in a nutshell, my question is the "Hi John" part and I want
>> to do my study correctly. Thanks a lot.
>>
>>
>>
>> -----Original Message-----
>> From: Leeds, Mark (IED)
>> Sent: Thursday, August 23, 2007 1:05 PM
>> To: 'John Sorkin'
>> Cc: 'markleeds at verizon.net'
>> Subject: RE: [R] How to fit an linear model withou intercept
>>
>>  Hi John : I'm from the R-list obviously and that was a nice example
>> that I cut and pasted and learned from.  I'm Sorry to bother you but I
>> had a non R question that I didn't want to pose to the R-list because
>> I think It's been discussed a lot in the past but I never focused on
>> the discussion.
>>
>> I need to do a study where I decide between two different univariate
>> regressions models. The LHS is the same in both cases and it's not the
>> goal of the study to build a prediction model but rather to see which
>> RHS ( univariate ) explains the LHS better.
>> It's actually in a time series framework also but that's not relevant
>> for my question. My question has 2 parts :
>>
>> 1) I was leaning towards using the R squared as the decision criteria
>> ( I will be Regressing monthly and over a couple of years so I will
>> have about 24 rsquareds. I have tons of data For one monthly
>> regression so I don't have to just do one big regression over the
>> whole time period ) but I noticed in your previous example that the
>> model with intercept ( compared to the model forced to have zero
>> intercept ) had a lower R^2 and a lower standard error at the same
>> time !!!!! So this asymmetry leads me to think that maybe I should be
>> using standard error rather than Rsquared as my criteria ?
>>
>> 2) This is possibly related to 1 : Isn't there a problem with using
>> the Rsquared for anything when you force no intercept ?
>> why I was thinking of including the intercept.
>> ( intercept in my problem really has no meaning but I wanted to retain
>> the validity of the Rsquared ) But, now that I see your email, maybe I
>> should be still including an intercept and using standard error as the
>> criteria.
>> Or maybe when you include an intercept ( in both cases ) you don't get
>> this asymmetry between Rsquared and standrd error.
>> I was surprised to see the asymmetry  but maybe it happens because one
>> is comparing model with intercept to a model without intercept and no
>> intercept probably renders the rsquared critieria meaningless in the
>> latter.
>>
>> Thanks for any insight you can provide. I can also center and go
>> without intercept because it sounded like you DEFINITELY preferred
>> that Method over just not including an intercept at all.  I was
>> thinking of sending this question to the R-list but I didn't want to
>> get hammered because I know that this is not a new discussion. Thanks so much.
>>
>>
>>
>> Mark
>>
>> P.S : How the heck did you get an MD and a Ph.D ? Unbelievable. Did
>> you do them at the same time ?
>>
>>
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
>> Sent: Thursday, August 23, 2007 9:29 AM
>> To: David Barron; Michal Kneifl; r-help
>> Subject: Re: [R] How to fit an linear model withou intercept
>>
>> Michael,
>> Assuming you want a model with an intercept of zero, I think we need
>> to ask you why you want an intercept of zero. When a "normal"
>> regression indicates a non-zero intercet, forcing the regression line
>> to have a zero intercept changes the meaning of the regression
>> coefficients. If for some reason you want to have a zero intercept,
>> but do not want to change the meaning of the regression coefficeints,
>> i.e. you still what to minimize the sum of the square deviations from
>> the BLUE (Best Leastsquares Unibiased Estimator) of the regression,
>> you can center your dependent and indepdent variables re-run the
>> regression. Centering means subtracting the mean of each variable from
>> the variable before performing the regression. When you do this, the
>> intercept term will be zero (or more likely a very, very, very small
>> number that is not statisitclally different from zero - it will not be
>> exactly zero due to limits on the precision of computer calculations)
>> and the slope term will be the sam!
>>  e as that you obtained from the "normal" BLUE regression. What you
>> are actually doing is transforming your data so it is centered around
>> x=0, y=0, i.e. the mean of the x and y terms will be zero. I am not
>> sure this is what you want to do, but I am pasting below some R code
>> that will allow you to see the affect fourcing the intercept to be
>> zero has on the slope, and how centering the data yields a zero
>> intercept without changing the slope.
>> John
>>
>>
>>
>>
>> # Set up x and y values. Note as defined the slope of the # regression
>> should be close to one (save for the "noise"
>> added to the y values) and the intercept should be close to four.
>> x<-0:10
>> y<-x+4+rnorm(11,0,1)
>> plot(x,y)
>> title("Original data")
>>
>> # Fit a "normal" regression line to the data and display # the
>> regression line on the scatter plot
>> fitNormalReg<-lm(y~x)
>> abline(fitNormalReg)
>>
>> # Fit a regression line in which the intercept has been # forced to be
>> zero and display the line on the scattter # plot.
>> fitZeroInt<-lm(y~-1+x)
>> abline(fitZeroInt,lty=2)
>>
>> # Compare fits.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> # There is a statistically significant difference # between the models
>> - the model with and intercetpt, # the "normal" regression is the
>> better fit.
>> anova(fit1,fit2)
>>
>> # Center y and x by subtracting their means.
>> yCentered<-y-mean(y)
>> xCentered<-x-mean(x)
>> # Regress the centered y values on the centered x values. This # will
>> give us a model with an intercept that is very, very # small. It would
>> be zero save for the precision limits # inherent in using a computer.
>> Plot the line. Notice the # slope of the centered is the same as that
>> obtained from # the normal regression.
>> fitCentered<-lm(yCentered~xCentered)
>> abline(fitCentered,lty=10)
>>
>> # Compare the three regressions. Note the slope from the # "normal"
>> regression and centered regressions are the same.
>> # The intercept from the centered regression is very, very small # and
>> would be zero save for the limits of computer mathematics.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> summary(fitCentered)
>>
>> # Plot the centered data and show that the line goes through zero.
>> plot(xCentered,yCentered)
>> abline(fitCentered)
>> title("Centered data")
>>
>>
>> # Set up x and y values. Note as defined the slope of the # regression
>> should be close to one (save for the "noise"
>> added to the y values) and the intercept should be close to four.
>> x<-0:10
>> y<-x+4+rnorm(11,0,1)
>> plot(x,y)
>> title("Original data")
>>
>> # Fit a "normal" regression line to the data and display # the
>> regression line on the scatter plot
>> fitNormalReg<-lm(y~x)
>> abline(fitNormalReg)
>>
>> # Fit a regression line in which the intercept has been # forced to be
>> zero and display the line on the scattter # plot.
>> fitZeroInt<-lm(y~-1+x)
>> abline(fitZeroInt,lty=2)
>>
>> # Compare fits.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> # There is a statistically significant difference # between the models
>> - the model with and intercetpt, # the "normal" regression is the
>> better fit.
>> anova(fit1,fit2)
>>
>> # Center y and x by subtracting their means.
>> yCentered<-y-mean(y)
>> xCentered<-x-mean(x)
>> # Regress the centered y values on the centered x values. This # will
>> give us a model with an intercept that is very, very # small. It would
>> be zero save for the precision limits # inherent in using a computer.
>> Plot the line. Notice the # slope of the centered is the same as that
>> obtained from # the normal regression.
>> fitCentered<-lm(yCentered~xCentered)
>> abline(fitCentered,lty=10)
>>
>> # Compare the three regressions. Note the slope from the # "normal"
>> regression and centered regressions are the same.
>> # The intercept from the centered regression is very, very small # and
>> would be zero save for the limits of computer mathematics.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> summary(fitCentered)
>>
>> # Plot the centered data and show that the line goes through zero.
>> plot(xCentered,yCentered)
>> abline(fitCentered)
>> title("Centered data")
>> par<-par(oldpar)
>>
>>
>>
>>
>>
>>
>>
>>
>> John Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> Baltimore VA Medical Center GRECC,
>> University of Maryland School of Medicine Claude D. Pepper OAIC,
>> University of Maryland Clinical Nutrition Research Unit, and Baltimore
>> VA Center Stroke of Excellence
>>
>> University of Maryland School of Medicine Division of Gerontology
>> Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>>
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> jsorkin at grecc.umaryland.edu
>>
>>
>>
>>>>> "David Barron" <mothsailor at googlemail.com> 08/23/07 5:38 AM >>>
>>>>>
>>>>>
>> A number of alternatives, such as:
>>
>> lm(y ~ 0 + x)
>> lm(y ~ x -1)
>>
>> See ?formula
>>
>> On 8/23/07, Michal Kneifl <xkneifl at mendelu.cz> wrote:
>>
>>
>>> Please could anyone help me?
>>> How can I fit a linear model where an intercept has no sense?
>>>
>>> Michael
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>> --
>> =================================
>> David Barron
>> University of Oxford
>> Park End Street
>> Oxford OX1 1HP
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for the
>> so...{{dropped}}
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> --------------------------------------------------------
>>
>>
--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/42803-8243
F ++49/40/42803-7790

```