[R] Fitting linear models

Marc Schwartz marc_schwartz at me.com
Tue Apr 21 18:11:57 CEST 2009


On Apr 21, 2009, at 10:37 AM, David Winsemius wrote:

>
> On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote:
>
>> David,
>> Thanks for the suggestions. No, I did not label my dependent  
>> variable "function".
>
> That was from my error in reading your call to lm. In my defense I  
> am reasonably sure the proper assignment to arguments is  
> lm(formula= ...) rather than lm(function= ...).
>>
>>
>> My dependent variable PBW and all the independent variables are  
>> continuous variables. It is especially troubling since the order in  
>> which I input independent variables determines whether or not it  
>> gets a coefficient.  Like I already mentioned, I checked the  
>> correlation matrix and picked the variables with moderate to high  
>> correlation with the independent variable. . So I guess it is not  
>> so naïve to expect a regression coefficient on all of them.
>>
>> Dimitri
>> model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before.
>
> Did you get the expected results with;
> model1<-lm(formula=PBW~SO4+NO3+NH4+0)
>
> You could, of course, provide either the data or the results of  
> str() applied to each of the variables and then we could all stop  
> guessing.

I am going to take a wild stab in the dark here and suggest that 'NH4'  
is exactly correlated to or even identical to one of the other IVs  
used in the formula.

  set.seed(1)
  PBW <- rnorm(100)
  SO4 <- rnorm(100)
  NO3 <- rnorm(100)
  NH4 <- rnorm(100)

 > lm(PBW ~ SO4 + NO3 + NH4)

Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)

Coefficients:
(Intercept)          SO4          NO3          NH4
     0.11065     -0.00273      0.02096     -0.04826


Now watch:

NH4 <- NO3 * 1.5

 > lm(PBW ~ SO4 + NO3 + NH4)

Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)

Coefficients:
(Intercept)          SO4          NO3          NH4
   1.084e-01   -7.871e-05    1.596e-02           NA


 > cor(cbind(SO4, NO3, NH4))
             SO4         NO3         NH4
SO4  1.00000000 -0.04953621 -0.04953621
NO3 -0.04953621  1.00000000  1.00000000
NH4 -0.04953621  1.00000000  1.00000000


I suspect that there is a collinearity problem here. Aparna, post back  
with the correlation matrix of your IV's (full data set) and that  
should either support or refute my theory. If supported and you use:

 > summary(lm(PBW ~ SO4 + NO3 + NH4))

Call:
lm(formula = PBW ~ SO4 + NO3 + NH4)

Residuals:
      Min       1Q   Median       3Q      Max
-2.30129 -0.60350  0.01765  0.58513  2.27806

Coefficients: (1 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.084e-01  9.083e-02   1.194    0.236
SO4         -7.871e-05  9.531e-02  -0.001    0.999
NO3          1.596e-02  8.827e-02   0.181    0.857
NH4                 NA         NA      NA       NA

Residual standard error: 0.9073 on 97 degrees of freedom
Multiple R-squared: 0.0003379,	Adjusted R-squared: -0.02027
F-statistic: 0.01639 on 2 and 97 DF,  p-value: 0.9837


Note the warning message about singularities for NH4.

BTW, as an aside, picking variables for a model based upon their  
correlation with the DV is not a good way to go. You might want to  
pick up a copy of Frank's book "Regression Modeling Strategies":

   http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

HTH,

Marc Schwartz




More information about the R-help mailing list