[R] FW: How to fit an linear model withou intercept
Eik Vettorazzi
E.Vettorazzi at uke.uni-hamburg.de
Wed Aug 29 12:40:53 CEST 2007
Hi Mark,
as last comment you may also take a look at
?summary.lm
where you will notice, that R reports two different R squares depending
on the presence or absence of an intercept term. For comparison issues
you should ensure that you use the same mathematical object.
There was a thread about this (from where I took essentially Prof.
Ripley reply for this answer) in Jan 2006, as you see in
http://tolstoy.newcastle.edu.au/R/help/06/01/18923.html
hth.
Leeds, Mark (IED) schrieb:
> Eik : Today, I've been reading Myers text , "classical and modern regression with applications" to refresh my memory
> about regression because it's been a while since I looked at that material. The usbtraction of the means from
> Both sides of the equation causing the intercept to be zero now makes more sense because, in the simple regression
> case,
>
> b0 = y bar - b1 x bar and, by subtracting the means, y bar and x bar both become zero, so b0 = zero.
>
> If you have any other comments, they are very appreciated and always invited but I think between what you showed and above,
> it's clearer now. I think I will go with centering both the left and right hand sides to force the zero intercepts, estimate
> each model with the intercept ( which will hopefully numerically estimate the intercept as very close to zero ) and then compare
> the RSquareds of the two models. If you still see this as a problem, let me know because I am totally open to listening to other
> people's brains , especially good ones like yours.
>
>
>
> -----Original Message-----
> From: Eik Vettorazzi [mailto:E.Vettorazzi at uke.uni-hamburg.de]
> Sent: Tuesday, August 28, 2007 8:33 AM
> To: Leeds, Mark (IED)
> Cc: R-help
> Subject: Re: FW: [R] How to fit an linear model withou intercept
>
> Hi Mark,
> I don't know wether you recived a sufficient reply or not, so here are my comments to your problem.
> Supressing the constant term in a regression model will probably lead to a violation of the classical assumptions for this model.
> From the OLS normal equations (in matrix notation)
> (1) (X'X)b=X'y
> and the definition of the OLS residuals
> (2) e = y-Xb
> you get - by substituting y form (2) in (1)
> (X'X)b=(X'X)b+X'e
> and hence
> X'e =0.
> Without a constant term you cannot assure, that the ols residuals
> e=(y-Xb) will have zero mean, wich holds when involving a constant term, since the first equation of X'e = 0 gives in this case sum(e)=0.
>
> For decomposing the TSS (y'y) into ESS (b'X'Xb) and RSS (e'e), which is needed to compute R², you will need X'e=0, because then the cross-product term b'X'e vanishes.
> Correct me if I'm wrong.
>
> Leeds, Mark (IED) schrieb:
>
>> Park, Eik : Could you start from the bottom and read this when you
>> have time. I really appreciate it.
>>
>> Basically, in a nutshell, my question is the "Hi John" part and I want
>> to do my study correctly. Thanks a lot.
>>
>>
>>
>> -----Original Message-----
>> From: Leeds, Mark (IED)
>> Sent: Thursday, August 23, 2007 1:05 PM
>> To: 'John Sorkin'
>> Cc: 'markleeds at verizon.net'
>> Subject: RE: [R] How to fit an linear model withou intercept
>>
>> Hi John : I'm from the R-list obviously and that was a nice example
>> that I cut and pasted and learned from. I'm Sorry to bother you but I
>> had a non R question that I didn't want to pose to the R-list because
>> I think It's been discussed a lot in the past but I never focused on
>> the discussion.
>>
>> I need to do a study where I decide between two different univariate
>> regressions models. The LHS is the same in both cases and it's not the
>> goal of the study to build a prediction model but rather to see which
>> RHS ( univariate ) explains the LHS better.
>> It's actually in a time series framework also but that's not relevant
>> for my question. My question has 2 parts :
>>
>> 1) I was leaning towards using the R squared as the decision criteria
>> ( I will be Regressing monthly and over a couple of years so I will
>> have about 24 rsquareds. I have tons of data For one monthly
>> regression so I don't have to just do one big regression over the
>> whole time period ) but I noticed in your previous example that the
>> model with intercept ( compared to the model forced to have zero
>> intercept ) had a lower R^2 and a lower standard error at the same
>> time !!!!! So this asymmetry leads me to think that maybe I should be
>> using standard error rather than Rsquared as my criteria ?
>>
>> 2) This is possibly related to 1 : Isn't there a problem with using
>> the Rsquared for anything when you force no intercept ?
>> I think I remember seeing discussions about this on the list. That's
>> why I was thinking of including the intercept.
>> ( intercept in my problem really has no meaning but I wanted to retain
>> the validity of the Rsquared ) But, now that I see your email, maybe I
>> should be still including an intercept and using standard error as the
>> criteria.
>> Or maybe when you include an intercept ( in both cases ) you don't get
>> this asymmetry between Rsquared and standrd error.
>> I was surprised to see the asymmetry but maybe it happens because one
>> is comparing model with intercept to a model without intercept and no
>> intercept probably renders the rsquared critieria meaningless in the
>> latter.
>>
>> Thanks for any insight you can provide. I can also center and go
>> without intercept because it sounded like you DEFINITELY preferred
>> that Method over just not including an intercept at all. I was
>> thinking of sending this question to the R-list but I didn't want to
>> get hammered because I know that this is not a new discussion. Thanks so much.
>>
>>
>>
>> Mark
>>
>> P.S : How the heck did you get an MD and a Ph.D ? Unbelievable. Did
>> you do them at the same time ?
>>
>>
>>
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
>> Sent: Thursday, August 23, 2007 9:29 AM
>> To: David Barron; Michal Kneifl; r-help
>> Subject: Re: [R] How to fit an linear model withou intercept
>>
>> Michael,
>> Assuming you want a model with an intercept of zero, I think we need
>> to ask you why you want an intercept of zero. When a "normal"
>> regression indicates a non-zero intercet, forcing the regression line
>> to have a zero intercept changes the meaning of the regression
>> coefficients. If for some reason you want to have a zero intercept,
>> but do not want to change the meaning of the regression coefficeints,
>> i.e. you still what to minimize the sum of the square deviations from
>> the BLUE (Best Leastsquares Unibiased Estimator) of the regression,
>> you can center your dependent and indepdent variables re-run the
>> regression. Centering means subtracting the mean of each variable from
>> the variable before performing the regression. When you do this, the
>> intercept term will be zero (or more likely a very, very, very small
>> number that is not statisitclally different from zero - it will not be
>> exactly zero due to limits on the precision of computer calculations)
>> and the slope term will be the sam!
>> e as that you obtained from the "normal" BLUE regression. What you
>> are actually doing is transforming your data so it is centered around
>> x=0, y=0, i.e. the mean of the x and y terms will be zero. I am not
>> sure this is what you want to do, but I am pasting below some R code
>> that will allow you to see the affect fourcing the intercept to be
>> zero has on the slope, and how centering the data yields a zero
>> intercept without changing the slope.
>> John
>>
>>
>>
>> oldpar<-par(ask=T)
>>
>> # Set up x and y values. Note as defined the slope of the # regression
>> should be close to one (save for the "noise"
>> added to the y values) and the intercept should be close to four.
>> x<-0:10
>> y<-x+4+rnorm(11,0,1)
>> plot(x,y)
>> title("Original data")
>>
>> # Fit a "normal" regression line to the data and display # the
>> regression line on the scatter plot
>> fitNormalReg<-lm(y~x)
>> abline(fitNormalReg)
>>
>> # Fit a regression line in which the intercept has been # forced to be
>> zero and display the line on the scattter # plot.
>> fitZeroInt<-lm(y~-1+x)
>> abline(fitZeroInt,lty=2)
>>
>> # Compare fits.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> # There is a statistically significant difference # between the models
>> - the model with and intercetpt, # the "normal" regression is the
>> better fit.
>> anova(fit1,fit2)
>>
>> # Center y and x by subtracting their means.
>> yCentered<-y-mean(y)
>> xCentered<-x-mean(x)
>> # Regress the centered y values on the centered x values. This # will
>> give us a model with an intercept that is very, very # small. It would
>> be zero save for the precision limits # inherent in using a computer.
>> Plot the line. Notice the # slope of the centered is the same as that
>> obtained from # the normal regression.
>> fitCentered<-lm(yCentered~xCentered)
>> abline(fitCentered,lty=10)
>>
>> # Compare the three regressions. Note the slope from the # "normal"
>> regression and centered regressions are the same.
>> # The intercept from the centered regression is very, very small # and
>> would be zero save for the limits of computer mathematics.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> summary(fitCentered)
>>
>> # Plot the centered data and show that the line goes through zero.
>> plot(xCentered,yCentered)
>> abline(fitCentered)
>> title("Centered data")
>> oldpar<-par(ask=T)
>>
>>
>> # Set up x and y values. Note as defined the slope of the # regression
>> should be close to one (save for the "noise"
>> added to the y values) and the intercept should be close to four.
>> x<-0:10
>> y<-x+4+rnorm(11,0,1)
>> plot(x,y)
>> title("Original data")
>>
>> # Fit a "normal" regression line to the data and display # the
>> regression line on the scatter plot
>> fitNormalReg<-lm(y~x)
>> abline(fitNormalReg)
>>
>> # Fit a regression line in which the intercept has been # forced to be
>> zero and display the line on the scattter # plot.
>> fitZeroInt<-lm(y~-1+x)
>> abline(fitZeroInt,lty=2)
>>
>> # Compare fits.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> # There is a statistically significant difference # between the models
>> - the model with and intercetpt, # the "normal" regression is the
>> better fit.
>> anova(fit1,fit2)
>>
>> # Center y and x by subtracting their means.
>> yCentered<-y-mean(y)
>> xCentered<-x-mean(x)
>> # Regress the centered y values on the centered x values. This # will
>> give us a model with an intercept that is very, very # small. It would
>> be zero save for the precision limits # inherent in using a computer.
>> Plot the line. Notice the # slope of the centered is the same as that
>> obtained from # the normal regression.
>> fitCentered<-lm(yCentered~xCentered)
>> abline(fitCentered,lty=10)
>>
>> # Compare the three regressions. Note the slope from the # "normal"
>> regression and centered regressions are the same.
>> # The intercept from the centered regression is very, very small # and
>> would be zero save for the limits of computer mathematics.
>> summary(fitNormalReg)
>> summary(fitZeroInt)
>> summary(fitCentered)
>>
>> # Plot the centered data and show that the line goes through zero.
>> plot(xCentered,yCentered)
>> abline(fitCentered)
>> title("Centered data")
>> par<-par(oldpar)
>>
>>
>>
>>
>>
>>
>>
>>
>> John Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> Baltimore VA Medical Center GRECC,
>> University of Maryland School of Medicine Claude D. Pepper OAIC,
>> University of Maryland Clinical Nutrition Research Unit, and Baltimore
>> VA Center Stroke of Excellence
>>
>> University of Maryland School of Medicine Division of Gerontology
>> Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>>
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> jsorkin at grecc.umaryland.edu
>>
>>
>>
>>>>> "David Barron" <mothsailor at googlemail.com> 08/23/07 5:38 AM >>>
>>>>>
>>>>>
>> A number of alternatives, such as:
>>
>> lm(y ~ 0 + x)
>> lm(y ~ x -1)
>>
>> See ?formula
>>
>> On 8/23/07, Michal Kneifl <xkneifl at mendelu.cz> wrote:
>>
>>
>>> Please could anyone help me?
>>> How can I fit a linear model where an intercept has no sense?
>>> Thanks in advance..
>>>
>>> Michael
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>> --
>> =================================
>> David Barron
>> Said Business School
>> University of Oxford
>> Park End Street
>> Oxford OX1 1HP
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> Confidentiality Statement:
>> This email message, including any attachments, is for the
>> so...{{dropped}}
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> --------------------------------------------------------
>>
>>
--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf
Martinistr. 52
20246 Hamburg
T ++49/40/42803-8243
F ++49/40/42803-7790
More information about the R-help
mailing list