[R] AER ivreg diagnostics: question on DF of Sargan test

Achim Zeileis Achim.Zeileis at uibk.ac.at
Thu Nov 7 19:07:21 CET 2013


Hélène,

thanks for spotting this! This is a bug in "AER". I had just tested the 
new diagnostics for regressions with 1 endogenous variable and hence 
never noticed the problem. But if there are > 1 endogenous variables, the 
df used in ivreg() (and hence the associated p-values) are too large.

I've fixed the problem in AER's devel-version and will release it on CRAN 
in the next days.

Thanks & best regards,
Z

On Thu, 7 Nov 2013, Hélène Huber-Yahi wrote:

> Hello,
> I'm new to R and I'm currently learning to use package AER, which is
> extremely comprehensive and useful. I have one question related to the
> diagnostics after ivreg: if I understood well, the Sargan test provided
> states that the statistic should follow a Chi squared of degrees of freedom
> equal to the number of excluded instruments minus one. But I read many
> times that the degrees of freedom of this statistic is supposed to equal
> the number of overidentifying restrictions, i.e. the number of excluded
> instruments minus the number of endogenous variables tested. When comparing
> with Stata results (estat overid after ivreg, same with ivreg2 output), the
> statistic is the same as the one provided by R, only the p-value changes
> because the distribution chosen is different. Is this command using a
> different flavor of the Sargan test ? I did not find the details in the AER
> pdf.
> I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The
> output I get from R is the following, where the Sargan DF is equal to 5,
> while I thought it would be equal to 6-3=3. The data comes from Verbeek's
> econometrics textbook and the example replicates the one in the book.
> Dependent variable is log of wage, endogenous variables are education,
> experience and its square (3 of them), excluded instruments are parents'
> education etc (6 of them).
>
>> ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+             data = school)> > summary(ivmodel,diagnostics=TRUE)
> Call:
> ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 +
>    south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 +
>    black + smsa76 + south76, data = school)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -1.63375 -0.22253  0.02403  0.24350  1.32911
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)  4.6064811  0.1126195  40.903  < 2e-16 ***
> ed76         0.0848507  0.0066061  12.844  < 2e-16 ***
> exp76        0.0796432  0.0164406   4.844 1.34e-06 ***
> exp762      -0.0020376  0.0008257  -2.468   0.0136 *
> black       -0.1726723  0.0195231  -8.845  < 2e-16 ***
> smsa76       0.1521693  0.0165207   9.211  < 2e-16 ***
> south76     -0.1204765  0.0154904  -7.778 1.01e-14 ***
>
> Diagnostic tests:
>                  df1  df2 statistic p-value
> Weak instruments    6 2987   965.450  <2e-16 ***
> Wu-Hausman          2 2988     1.949   0.143
> Sargan              5   NA     3.868   0.569
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0.3753 on 2990 degrees of freedom
> Multiple R-Squared: 0.2868,	Adjusted R-squared: 0.2854
> Wald test: 178.6 on 6 and 2990 DF,  p-value: < 2.2e-16
>
>
> Would this be caused by the fact that I'm using 2SLS and not GMM (at least
> I suppose) to estimate the IV model ? I apologize if this comes from a
> misunderstanding from my part, and I thank you in advance for your help.
>
> Best,
>
> H. Huber
>
> 	[[alternative HTML version deleted]]
>
>


More information about the R-help mailing list