[R] low R square value from ANCOVA model

Paul Johnson pauljohn32 at gmail.com
Wed May 9 05:23:38 CEST 2012


On Tue, May 8, 2012 at 3:45 PM, array chip <arrayprofile at yahoo.com> wrote:
> Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy.
>
> What's your opinion on this argument?

I think that argument is silly. I'm sorry if that is too blunt. Its
just plain superficial.
 It reflects a poor understanding of what the linear model is all
about. If you have
other variables that might "belong" in the model, run them and test.
The R-square,
either low or high, does not have anything direct to say about whether
those other
variables exist.

Here's my authority.

Arthur Goldberger (A Course in Econometrics, 1991, p.177)
“Nothing in the CR (Classical Regression) model requires that R2 be high. Hence,
a high R2 is not evidence in favor of the model, and a low R2 is not evidence
against it.”

I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
Generalized Latend Variable Modeling: Multilevel, Longitudinal,
and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

>From Section 8.5.2:

"Furthermore, how badly the baseline model fits the data depends greatly
on the magnitude of the parameters of the true model. For instance, consider
estimating a simple parallel measurement model. If the true model is a
congeneric measurement model (with considerable variation in factor loadings
and measurement error variances between items), the fit index could be high
simply because the null model fits very poorly, i.e. because the
reliabilities of
the items are high. However, if the true model is a parallel measurement model
with low reliabilities the fit index could be low although we are estimating the
correct model. Similarly, estimating a simple linear regression model can yield
a high R2 if the relationship is actually quadratic with a considerable linear
trend and a low R2 when the model is true but with a small slope (relative to
the overall variance)."

For a detailed argument/explanation of the argument that the R-square is not
a way to decide if a model is "good" or "bad" see

King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science. American Journal of Political Science,
30(3), 666–687. doi:10.2307/2111095

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu



More information about the R-help mailing list