[R] Fitting linear models
Vemuri, Aparna
avemuri at epri.com
Tue Apr 21 18:21:44 CEST 2009
Thanks Dimitri! Following exactly what you did, I wrote all my individual variable vectors to a data frame and used lm(formula,data) and this time it works for me too.
Marc, your theory is correct.NH4 variable shares a strong correlation with one of the IV along with the DV.
SO4 NO3 NH4 PBW
SO4 1 -0.0867 0.999 0.999
NO3 -0.0867 1 -0.0527 -0.0938
NH4 0.999 -0.0527 1 0.999
PBW 0.999 -0.0938 0.999 1
Aparna
-----Original Message-----
From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com]
Sent: Tuesday, April 21, 2009 9:02 AM
To: Vemuri, Aparna
Cc: r-help at r-project.org; David Winsemius
Subject: Re: [R] Fitting linear models
I am not sure what the problem is.
I found no errors:
data<-read.csv(file.choose()) # I had to change your file extension
to .csv first
dim(data)
names(data)
lapply(data,function(x){sum(is.na(x))})
lm.model.1<-lm(PBW~SO4+NO3+NH4,data)
lm.model.2<-lm(PBW~SO4+NH4+NO3,data)
print(lm.model.1) # Getting nice results
print(lm.model.2) # Getting same results
# Another method (gets exactly the same results):
library(Design)
ols.model.1<-ols(PBW~SO4+NO3+NH4,data)
ols.model.2<-ols(PBW~SO4+NH4+NO3,data)
Dimitri
On Tue, Apr 21, 2009 at 11:50 AM, Vemuri, Aparna <avemuri at epri.com> wrote:
> Attached are the first hundred rows of my data in comma separated format.
> Forcing the regression line through the origin, still does not give a coefficient on the last independent variable. Also, I don't mind if there is a coefficient on the dependent axis. I just want all of the variables to have coefficients in the regression equation or a at least a consistent result, irrespective of the order of input information.
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Tuesday, April 21, 2009 8:38 AM
> To: Vemuri, Aparna
> Cc: r-help at r-project.org
> Subject: Re: [R] Fitting linear models
>
>
> On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote:
>
>> David,
>> Thanks for the suggestions. No, I did not label my dependent
>> variable "function".
>
> That was from my error in reading your call to lm. In my defense I am
> reasonably sure the proper assignment to arguments is lm(formula= ...)
> rather than lm(function= ...).
>>
>>
>> My dependent variable PBW and all the independent variables are
>> continuous variables. It is especially troubling since the order in
>> which I input independent variables determines whether or not it
>> gets a coefficient. Like I already mentioned, I checked the
>> correlation matrix and picked the variables with moderate to high
>> correlation with the independent variable. . So I guess it is not so
>> naïve to expect a regression coefficient on all of them.
>>
>> Dimitri
>> model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before.
>
> Did you get the expected results with;
> model1<-lm(formula=PBW~SO4+NO3+NH4+0)
>
> You could, of course, provide either the data or the results of str()
> applied to each of the variables and then we could all stop guessing.
>
>>
>> Aparna
>>
>>>
>>>
>>> I am using the lm() function in R to fit a dependent variable to a
>>> set
>>> of 3 to 5 independent variables. For this, I used the following
>>> commands:
>>>
>>>> model1<-lm(function=PBW~SO4+NO3+NH4)
>>> Coefficients:
>>> (Intercept) SO4 NO3 NH4
>>> 0.01323 0.01968 0.01856 NA
>>>
>>> and
>>>
>>>> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl)
>>>
>>> Coefficients:
>>> (Intercept) SO4 NO3 NH4
>>> Na Cl
>>> -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751
>>> NA
>>>
>>> In both cases, the last independent variable has a coefficient of NA
>>> in
>>> the result. I say last variable because, when I change the order of
>>> the
>>> variables, the coefficient changes (see below). Can anyone point me
>>> to
>>> the reason R behaves this way? Is there anyway for me to force R to
>>> use
>>> all the variables? I checked the correlation matrices to makes sure
>>> there is no orthogonality between the variables.
>>
>> You really did not name your dependent variable "function" did you?
>> Please stop that.
>>
>> Just a guess, ... since you have not provided enough information to do
>> otherwise, ... Are all of those variables 1/0 dummy variables? If so
>> and if you want to have an output that satisfies your need for
>> labeling the coefficients as you naively anticipate, then put "0+" at
>> the beginning of the formula or "-1" at the end, so that the intercept
>> will disappear and then all variables will get labeled as you expect.
> --
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
--
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com
More information about the R-help
mailing list