[R] Fitting linear models
Marc Schwartz
marc_schwartz at me.com
Tue Apr 21 18:11:57 CEST 2009
On Apr 21, 2009, at 10:37 AM, David Winsemius wrote:
>
> On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote:
>
>> David,
>> Thanks for the suggestions. No, I did not label my dependent
>> variable "function".
>
> That was from my error in reading your call to lm. In my defense I
> am reasonably sure the proper assignment to arguments is
> lm(formula= ...) rather than lm(function= ...).
>>
>>
>> My dependent variable PBW and all the independent variables are
>> continuous variables. It is especially troubling since the order in
>> which I input independent variables determines whether or not it
>> gets a coefficient. Like I already mentioned, I checked the
>> correlation matrix and picked the variables with moderate to high
>> correlation with the independent variable. . So I guess it is not
>> so naïve to expect a regression coefficient on all of them.
>>
>> Dimitri
>> model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before.
>
> Did you get the expected results with;
> model1<-lm(formula=PBW~SO4+NO3+NH4+0)
>
> You could, of course, provide either the data or the results of
> str() applied to each of the variables and then we could all stop
> guessing.
I am going to take a wild stab in the dark here and suggest that 'NH4'
is exactly correlated to or even identical to one of the other IVs
used in the formula.
set.seed(1)
PBW <- rnorm(100)
SO4 <- rnorm(100)
NO3 <- rnorm(100)
NH4 <- rnorm(100)
> lm(PBW ~ SO4 + NO3 + NH4)
Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)
Coefficients:
(Intercept) SO4 NO3 NH4
0.11065 -0.00273 0.02096 -0.04826
Now watch:
NH4 <- NO3 * 1.5
> lm(PBW ~ SO4 + NO3 + NH4)
Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)
Coefficients:
(Intercept) SO4 NO3 NH4
1.084e-01 -7.871e-05 1.596e-02 NA
> cor(cbind(SO4, NO3, NH4))
SO4 NO3 NH4
SO4 1.00000000 -0.04953621 -0.04953621
NO3 -0.04953621 1.00000000 1.00000000
NH4 -0.04953621 1.00000000 1.00000000
I suspect that there is a collinearity problem here. Aparna, post back
with the correlation matrix of your IV's (full data set) and that
should either support or refute my theory. If supported and you use:
> summary(lm(PBW ~ SO4 + NO3 + NH4))
Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)
Residuals:
Min 1Q Median 3Q Max
-2.30129 -0.60350 0.01765 0.58513 2.27806
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.084e-01 9.083e-02 1.194 0.236
SO4 -7.871e-05 9.531e-02 -0.001 0.999
NO3 1.596e-02 8.827e-02 0.181 0.857
NH4 NA NA NA NA
Residual standard error: 0.9073 on 97 degrees of freedom
Multiple R-squared: 0.0003379, Adjusted R-squared: -0.02027
F-statistic: 0.01639 on 2 and 97 DF, p-value: 0.9837
Note the warning message about singularities for NH4.
BTW, as an aside, picking variables for a model based upon their
correlation with the DV is not a good way to go. You might want to
pick up a copy of Frank's book "Regression Modeling Strategies":
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS
HTH,
Marc Schwartz
More information about the R-help
mailing list