[R] low R square value from ANCOVA model
Bert Gunter
gunter.berton at gene.com
Wed May 9 07:07:35 CEST 2012
"It gets curiouser and curiouser," said Alice.
-- Bert
On Tue, May 8, 2012 at 9:07 PM, array chip <arrayprofile at yahoo.com> wrote:
> Paul, thanks for your thoughts. blunt, not at all....
>
> If I understand correctly, it doesn't help anything to speculate whether there might be additional variables existing or not. Given current variables in the model, it's perfectly fine to draw conclusions based on significant coefficients regardless of R-squared is high or low.
>
> Gary King's article is interesting...
>
> John
>
>
>
> ________________________________
> From: Paul Johnson <pauljohn32 at gmail.com>
>
> Cc: peter dalgaard <pdalgd at gmail.com>; "r-help at r-project.org" <r-help at r-project.org>
> Sent: Tuesday, May 8, 2012 8:23 PM
> Subject: Re: [R] low R square value from ANCOVA model
>
>
>> Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy.
>>
>> What's your opinion on this argument?
>
> I think that argument is silly. I'm sorry if that is too blunt. Its
> just plain superficial.
> It reflects a poor understanding of what the linear model is all
> about. If you have
> other variables that might "belong" in the model, run them and test.
> The R-square,
> either low or high, does not have anything direct to say about whether
> those other
> variables exist.
>
> Here's my authority.
>
> Arthur Goldberger (A Course in Econometrics, 1991, p.177)
> “Nothing in the CR (Classical Regression) model requires that R2 be high. Hence,
> a high R2 is not evidence in favor of the model, and a low R2 is not evidence
> against it.”
>
> I found that reference in Anders Skrondal and Sophia Rabe-Hesketh,
> Generalized Latend Variable Modeling: Multilevel, Longitudinal,
> and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.
>
> From Section 8.5.2:
>
> "Furthermore, how badly the baseline model fits the data depends greatly
> on the magnitude of the parameters of the true model. For instance, consider
> estimating a simple parallel measurement model. If the true model is a
> congeneric measurement model (with considerable variation in factor loadings
> and measurement error variances between items), the fit index could be high
> simply because the null model fits very poorly, i.e. because the
> reliabilities of
> the items are high. However, if the true model is a parallel measurement model
> with low reliabilities the fit index could be low although we are estimating the
> correct model. Similarly, estimating a simple linear regression model can yield
> a high R2 if the relationship is actually quadratic with a considerable linear
> trend and a low R2 when the model is true but with a small slope (relative to
> the overall variance)."
>
> For a detailed argument/explanation of the argument that the R-square is not
> a way to decide if a model is "good" or "bad" see
>
> King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
> Quantitative Political Science. American Journal of Political Science,
> 30(3), 666–687. doi:10.2307/2111095
>
> pj
> --
> Paul E. Johnson
> Professor, Political Science Assoc. Director
> 1541 Lilac Lane, Room 504 Center for Research Methods
> University of Kansas University of Kansas
> http://pj.freefaculty.org http://quant.ku.edu
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list