[R] FW: How to fit an linear model withou intercept
Eik Vettorazzi
E.Vettorazzi at uke.uni-hamburg.de
Tue Aug 28 14:33:00 CEST 2007
Hi Mark,
I don't know wether you recived a sufficient reply or not, so here are
my comments to your problem.
Supressing the constant term in a regression model will probably lead to
a violation of the classical assumptions for this model.
From the OLS normal equations (in matrix notation)
(1) (X'X)b=X'y
and the definition of the OLS residuals
(2) e = y-Xb
you get - by substituting y form (2) in (1)
(X'X)b=(X'X)b+X'e
and hence
X'e =0.
Without a constant term you cannot assure, that the ols residuals
e=(y-Xb) will have zero mean, wich holds when involving a constant term,
since the first equation of X'e = 0 gives in this case sum(e)=0.
For decomposing the TSS (y'y) into ESS (b'X'Xb) and RSS (e'e), which is
needed to compute R², you will need X'e=0, because then the
cross-product term b'X'e vanishes.
Correct me if I'm wrong.
Leeds, Mark (IED) schrieb:
> Park, Eik : Could you start from the bottom and read this when you have
> time. I really appreciate it.
>
> Basically, in a nutshell, my question is the "Hi John" part and I want
> to do my study correctly. Thanks a lot.
>
>
>
> -----Original Message-----
> From: Leeds, Mark (IED)
> Sent: Thursday, August 23, 2007 1:05 PM
> To: 'John Sorkin'
> Cc: 'markleeds at verizon.net'
> Subject: RE: [R] How to fit an linear model withou intercept
>
> Hi John : I'm from the R-list obviously and that was a nice example
> that I cut and pasted and learned from. I'm Sorry to bother you but I
> had a non R question that I didn't want to pose to the R-list because I
> think It's been discussed a lot in the past but I never focused on the
> discussion.
>
> I need to do a study where I decide between two different univariate
> regressions models. The LHS is the same in both cases and it's not the
> goal of the study to build a prediction model but rather to see which
> RHS ( univariate ) explains the LHS better.
> It's actually in a time series framework also but that's not relevant
> for my question. My question has 2 parts :
>
> 1) I was leaning towards using the R squared as the decision criteria (
> I will be Regressing monthly and over a couple of years so I will have
> about 24 rsquareds. I have tons of data For one monthly regression so I
> don't have to just do one big regression over the whole time period )
> but I noticed in your previous example that the model with intercept (
> compared to the model forced to have zero intercept ) had a lower R^2
> and a lower standard error at the same time !!!!! So this asymmetry
> leads me to think that maybe I should be using standard error rather
> than Rsquared as my criteria ?
>
> 2) This is possibly related to 1 : Isn't there a problem with using the
> Rsquared for anything when you force no intercept ?
> I think I remember seeing discussions about this on the list. That's why
> I was thinking of including the intercept.
> ( intercept in my problem really has no meaning but I wanted to retain
> the validity of the Rsquared ) But, now that I see your email, maybe I
> should be still including an intercept and using standard error as the
> criteria.
> Or maybe when you include an intercept ( in both cases ) you don't get
> this asymmetry between Rsquared and standrd error.
> I was surprised to see the asymmetry but maybe it happens because one
> is comparing model with intercept to a model without intercept and no
> intercept probably renders the rsquared critieria meaningless in the
> latter.
>
> Thanks for any insight you can provide. I can also center and go without
> intercept because it sounded like you DEFINITELY preferred that Method
> over just not including an intercept at all. I was thinking of sending
> this question to the R-list but I didn't want to get hammered because I
> know that this is not a new discussion. Thanks so much.
>
>
>
> Mark
>
> P.S : How the heck did you get an MD and a Ph.D ? Unbelievable. Did you
> do them at the same time ?
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
> Sent: Thursday, August 23, 2007 9:29 AM
> To: David Barron; Michal Kneifl; r-help
> Subject: Re: [R] How to fit an linear model withou intercept
>
> Michael,
> Assuming you want a model with an intercept of zero, I think we need to
> ask you why you want an intercept of zero. When a "normal" regression
> indicates a non-zero intercet, forcing the regression line to have a
> zero intercept changes the meaning of the regression coefficients. If
> for some reason you want to have a zero intercept, but do not want to
> change the meaning of the regression coefficeints, i.e. you still what
> to minimize the sum of the square deviations from the BLUE (Best
> Leastsquares Unibiased Estimator) of the regression, you can center your
> dependent and indepdent variables re-run the regression. Centering means
> subtracting the mean of each variable from the variable before
> performing the regression. When you do this, the intercept term will be
> zero (or more likely a very, very, very small number that is not
> statisitclally different from zero - it will not be exactly zero due to
> limits on the precision of computer calculations) and the slope term
> will be the sam!
> e as that you obtained from the "normal" BLUE regression. What you are
> actually doing is transforming your data so it is centered around x=0,
> y=0, i.e. the mean of the x and y terms will be zero. I am not sure this
> is what you want to do, but I am pasting below some R code that will
> allow you to see the affect fourcing the intercept to be zero has on the
> slope, and how centering the data yields a zero intercept without
> changing the slope.
> John
>
>
>
> oldpar<-par(ask=T)
>
> # Set up x and y values. Note as defined the slope of the # regression
> should be close to one (save for the "noise"
> added to the y values) and the intercept should be close to four.
> x<-0:10
> y<-x+4+rnorm(11,0,1)
> plot(x,y)
> title("Original data")
>
> # Fit a "normal" regression line to the data and display # the
> regression line on the scatter plot
> fitNormalReg<-lm(y~x)
> abline(fitNormalReg)
>
> # Fit a regression line in which the intercept has been # forced to be
> zero and display the line on the scattter # plot.
> fitZeroInt<-lm(y~-1+x)
> abline(fitZeroInt,lty=2)
>
> # Compare fits.
> summary(fitNormalReg)
> summary(fitZeroInt)
> # There is a statistically significant difference # between the models -
> the model with and intercetpt, # the "normal" regression is the better
> fit.
> anova(fit1,fit2)
>
> # Center y and x by subtracting their means.
> yCentered<-y-mean(y)
> xCentered<-x-mean(x)
> # Regress the centered y values on the centered x values. This # will
> give us a model with an intercept that is very, very # small. It would
> be zero save for the precision limits # inherent in using a computer.
> Plot the line. Notice the # slope of the centered is the same as that
> obtained from # the normal regression.
> fitCentered<-lm(yCentered~xCentered)
> abline(fitCentered,lty=10)
>
> # Compare the three regressions. Note the slope from the # "normal"
> regression and centered regressions are the same.
> # The intercept from the centered regression is very, very small # and
> would be zero save for the limits of computer mathematics.
> summary(fitNormalReg)
> summary(fitZeroInt)
> summary(fitCentered)
>
> # Plot the centered data and show that the line goes through zero.
> plot(xCentered,yCentered)
> abline(fitCentered)
> title("Centered data")
> oldpar<-par(ask=T)
>
>
> # Set up x and y values. Note as defined the slope of the # regression
> should be close to one (save for the "noise"
> added to the y values) and the intercept should be close to four.
> x<-0:10
> y<-x+4+rnorm(11,0,1)
> plot(x,y)
> title("Original data")
>
> # Fit a "normal" regression line to the data and display # the
> regression line on the scatter plot
> fitNormalReg<-lm(y~x)
> abline(fitNormalReg)
>
> # Fit a regression line in which the intercept has been # forced to be
> zero and display the line on the scattter # plot.
> fitZeroInt<-lm(y~-1+x)
> abline(fitZeroInt,lty=2)
>
> # Compare fits.
> summary(fitNormalReg)
> summary(fitZeroInt)
> # There is a statistically significant difference # between the models -
> the model with and intercetpt, # the "normal" regression is the better
> fit.
> anova(fit1,fit2)
>
> # Center y and x by subtracting their means.
> yCentered<-y-mean(y)
> xCentered<-x-mean(x)
> # Regress the centered y values on the centered x values. This # will
> give us a model with an intercept that is very, very # small. It would
> be zero save for the precision limits # inherent in using a computer.
> Plot the line. Notice the # slope of the centered is the same as that
> obtained from # the normal regression.
> fitCentered<-lm(yCentered~xCentered)
> abline(fitCentered,lty=10)
>
> # Compare the three regressions. Note the slope from the # "normal"
> regression and centered regressions are the same.
> # The intercept from the centered regression is very, very small # and
> would be zero save for the limits of computer mathematics.
> summary(fitNormalReg)
> summary(fitZeroInt)
> summary(fitCentered)
>
> # Plot the centered data and show that the line goes through zero.
> plot(xCentered,yCentered)
> abline(fitCentered)
> title("Centered data")
> par<-par(oldpar)
>
>
>
>
>
>
>
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and Baltimore
> VA Center Stroke of Excellence
>
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> jsorkin at grecc.umaryland.edu
>
>
>>>> "David Barron" <mothsailor at googlemail.com> 08/23/07 5:38 AM >>>
>>>>
> A number of alternatives, such as:
>
> lm(y ~ 0 + x)
> lm(y ~ x -1)
>
> See ?formula
>
> On 8/23/07, Michal Kneifl <xkneifl at mendelu.cz> wrote:
>
>> Please could anyone help me?
>> How can I fit a linear model where an intercept has no sense?
>> Thanks in advance..
>>
>> Michael
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> =================================
> David Barron
> Said Business School
> University of Oxford
> Park End Street
> Oxford OX1 1HP
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for the\...{{dropped}}
More information about the R-help
mailing list