[R] simple lm question

David Winsemius dwinsemius at comcast.net
Sat Dec 3 05:41:01 CET 2011


On Dec 2, 2011, at 11:20 PM, Worik R wrote:

> Duh!  Silly me!  But my confusion persits:  What is the regression  
> being
> done?  See below....

<Sigh>  Please note that your "df" and "M" are undoubtedly different  
objects by now:

 > M <- matrix(runif(5*20), nrow=20)
 > colnames(M) <- c('a', 'b', 'c', 'd', 'e')
 > l1 <- lm(e~., data=as.data.frame(M))
 > l1

Call:
lm(formula = e ~ ., data = as.data.frame(M))

Coefficients:
(Intercept)            a            b            c            d
     0.40139     -0.15032     -0.06242      0.13139      0.23905

 > l3 <- lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
 > l3

Call:
lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])

Coefficients:
(Intercept)       M[, 1]       M[, 2]       M[, 3]       M[, 4]
     0.40139     -0.15032     -0.06242      0.13139      0.23905

As expected.

-- 
David.

>
> On Sat, Dec 3, 2011 at 5:10 PM, R. Michael Weylandt <
> michael.weylandt at gmail.com> wrote:
>
>> In your code by supplying a vector M[,"e"] you are regressing "e"
>> against all the variables provided in the data argument, including  
>> "e"
>> itself -- this gives the very strange regression coefficients you
>> observe. R has no way to know that that's somehow related to the "e"
>> it sees in the data argument.
>>
>
>> In the suggested way,
>>
>> lm(formula = e ~ ., data = as.data.frame(M))
>>
>> e is regressed against everything that is not e and sensible  
>> results are
>> given.
>>
>
> But still 'l1 <- lm(e~., data=df)' is not the same as 'l3 <-
> lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])'
>
>> M <- matrix(runif(5*20), nrow=20)
>> colnames(M) <- c('a', 'b', 'c', 'd', 'e')
>> l1 <- lm(e~., data=df)
>> summary(l1)
>
> Call:
> lm(formula = e ~ ., data = df)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -0.38343 -0.21367  0.03067  0.13757  0.49080
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.28521    0.29477   0.968    0.349
> a            0.09283    0.30112   0.308    0.762
> b            0.23921    0.22425   1.067    0.303
> c           -0.16027    0.24154  -0.664    0.517
> d            0.24025    0.20054   1.198    0.250
>
> Residual standard error: 0.2871 on 15 degrees of freedom
> Multiple R-squared: 0.1602,    Adjusted R-squared: -0.06375
> F-statistic: 0.7153 on 4 and 15 DF,  p-value: 0.5943
>
>> l3 <- lm(M[,5]~M[,1]+M[,2]+M[,3]+M[,4])
>> summary(l3)
>
> Call:
> lm(formula = M[, 5] ~ M[, 1] + M[, 2] + M[, 3] + M[, 4])
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -0.36355 -0.22679 -0.01202  0.18462  0.37377
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.76972    0.24501   3.142  0.00672 **
> M[, 1]      -0.23830    0.24123  -0.988  0.33890
> M[, 2]      -0.02046    0.21958  -0.093  0.92699
> M[, 3]      -0.29518    0.22559  -1.308  0.21040
> M[, 4]      -0.31545    0.24570  -1.284  0.21866
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.2668 on 15 degrees of freedom
> Multiple R-squared: 0.2762,    Adjusted R-squared: 0.08317
> F-statistic: 1.431 on 4 and 15 DF,  p-value: 0.272
>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list