[R] Intepreting lm() results with factor
peter dalgaard
pdalgd at gmail.com
Tue Dec 3 14:46:59 CET 2013
On 03 Dec 2013, at 01:08 , David Gwenzi <dgwenzi at gmail.com> wrote:
> Dear all
>
> I have observations done in 4 different classes and the between classes
> *variance* is too high that I decided to run a model without pooling the
> *variance*. I used the following code first :
> model<-lm(y~x+factor(class))
> and got the following output:
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 52.41405 17.38161 3.015 0.00658 **
> x 0.27679 0.07387 3.747 0.00119 **
> factor(class)2 92.68083 32.26645 2.872 0.00912 **
> factor(class)3 197.82029 33.24916 5.950 6.63e-06 ***
> factor(class)4 105.61266 55.18373 1.914 0.06937 .
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 43.07 on 21 degrees of freedom
> Multiple R-squared: 0.9206, Adjusted R-squared: 0.9055
> F-statistic: 60.91 on 4 and 21 DF, p-value: 2.976e-11
>
> My understanding of this output is that class 1 is used as a baseline
> (constant) and each other class's p values means for example the dependent
> value in class 2 is significantly different from that of class 1.
> Now I ran the model again, but without using a constant i.e
> model<-lm(y~x+factor(class)-1)
> and got the following output:
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> x 0.27679 0.07387 3.747 0.00119 **
> factor(class)1 52.41405 17.38161 3.015 0.00658 **
> factor(class)2 145.09488 39.42651 3.680 0.00139 **
> factor(class)3 250.23434 40.61189 6.162 4.11e-06 ***
> factor(class)4 158.02672 64.09549 2.465 0.02238 *
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 43.07 on 21 degrees of freedom
> Multiple R-squared: 0.9801, Adjusted R-squared: 0.9754
> F-statistic: 207.1 on 5 and 21 DF, p-value: < 2.2e-16
>
> Can somebody please tell me how to interpret this one now? what do the
> classes' P values mean ? Do they merely show if they significantly
> contribute to the model or whether they are significantly different from
> the overall mean or not? Does it mean if one class had a p value > 0.05 it
> would mean the observations from that class are not significantly
> contributing to the model?
The estimates are of the per-class intercept and the P-value corresponds to a test that said intercept is zero (which is very rarely a relevant hypothesis).
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list