[R-SIG-Finance] R: [Fwd: R-SIG-Finance Digest, Vol 60, Issue 18]

Mon May 18 22:06:31 CEST 2009

I just realized I used Robust in my Stata 9.2 analysis. When I remove this,
the Chi-sq values are much closer to the values I get in R (but negative, as
the consistent model must be listed first in a chi-sq calculation). However,
with my own data I do get this positive definite error in Stata. Is this a
result of unbalanced data? R doesn't give an error, so I am inclined to
ignore it in Stata. I am posting my own results from R and Stata, and
attaching the data as a csv.

Thanks, hope I am not wasting too much of your time here.

-Steve

###R-Output###
> library("plm")
>
> fdi <- read.csv("C:/data/mydata.csv", na.strings=".")
> fdiplm<-plm.data(fdi, index = c("id_code_id", "year"))
series    are constants and have been removed
>
> fdi_test<-(lfdi_2000~ lagdlfdi+ laglnstock2000+ lagtradegdp +lagdlgdp)
>
> fdi_test_fe <- plm(fdi_test, data=fdiplm, model="within")
> fdi_test_re <- plm(fdi_test, data=fdiplm, model="random")
>
> summary (fdi_test_fe)
Oneway (individual) effect Within Model

Call:
plm(formula = fdi_test, data = fdiplm, model = "within")

Unbalanced Panel: n=149, T=3-27, N=2697

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-8.2100 -0.4760  0.0452  0.5670  4.8700

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)
lagdlfdi       0.1564759  0.0180645  8.6621 < 2.2e-16 ***
laglnstock2000 0.7621350  0.0246798 30.8809 < 2.2e-16 ***
lagtradegdp    0.0178568  0.0025859  6.9055 5.003e-12 ***
lagdlgdp       0.2601477  0.0427744  6.0818 1.188e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    4606.7
Residual Sum of Squares: 2938
F-statistic: 361.237 on 4 and 2544 DF, p-value: < 2.22e-16
> summary (fdi_test_re)
Oneway (individual) effect Random Effect Model
   (Swamy-Arora's transformation)

Call:
plm(formula = fdi_test, data = fdiplm, model = "random")

Unbalanced Panel: n=149, T=3-27, N=2697

Effects:
                  var std.dev  share
idiosyncratic 1.15487 1.07465 0.6617
individual    0.59044 0.76840 0.3383
theta  :
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.3718  0.6700  0.7081  0.6955  0.7355  0.7401

Residuals :
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-9.15000 -0.47900  0.07270 -0.00713  0.59800  3.95000

Coefficients :
                 Estimate Std. Error  t-value  Pr(>|t|)
(Intercept)    16.7744214  0.1552868 108.0222 < 2.2e-16 ***
lagdlfdi        0.1632388  0.0181005   9.0185 < 2.2e-16 ***
laglnstock2000  0.8314432  0.0196444  42.3247 < 2.2e-16 ***
lagtradegdp     0.0119453  0.0020737   5.7605 8.386e-09 ***
lagdlgdp        0.2558009  0.0424599   6.0245 1.696e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    9522.3
Residual Sum of Squares: 3140.8
F-statistic: 1367.42 on 4 and 2692 DF, p-value: < 2.22e-16
>
> phtest(fdi_test_re, fdi_test_fe)

        Hausman Test

data:  fdi_test
chisq = 23.7021, df = 4, p-value = 9.164e-05
alternative hypothesis: one model is inconsistent

###end R output###

###Stata 9.2 Output--canned###
xtreg lfdi_2000 lagdlfdi laglnstock2000 lagtradegdp lagdlgdp, fe;

Fixed-effects (within) regression               Number of obs      =
2697
Group variable (i): id_code_id                  Number of groups   =
149

R-sq:  within  = 0.3622                         Obs per group: min =
3
       between = 0.8234                                        avg =
18.1
       overall = 0.6998                                        max =
27

                                                F(4,2544)          =
361.24
corr(u_i, Xb)  = 0.3536                         Prob > F           =
0.0000

------------------------------------------------------------------------------
   lfdi_2000 |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
    lagdlfdi |   .1564758   .0180645     8.66   0.000     .1210532
.1918985
laglnst~2000 |    .762135   .0246798    30.88   0.000     .7137404
.8105295
 lagtradegdp |   .0178568   .0025859     6.91   0.000     .0127861
.0229274
    lagdlgdp |   .2601478   .0427744     6.08   0.000     .1762716
.3440241
       _cons |   17.01131   .1701713    99.97   0.000     16.67762
17.345
-------------+----------------------------------------------------------------
     sigma_u |  .93048942
     sigma_e |  1.0746505
         rho |  .42847396   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(148, 2544) =    10.73           Prob > F =
0.0000

. estimates store FIX, title(The FE) ;

. xtreg lfdi_2000 lagdlfdi laglnstock2000 lagtradegdp lagdlgdp, re;

Random-effects GLS regression                   Number of obs      =
2697
Group variable (i): id_code_id                  Number of groups   =
149

R-sq:  within  = 0.3606                         Obs per group: min =
3
       between = 0.8402                                        avg =
18.1
       overall = 0.7128                                        max =
27

Random effects u_i ~ Gaussian                   Wald chi2(4)       =
2225.46
corr(u_i, X)       = 0 (assumed)                Prob > chi2        =
0.0000

------------------------------------------------------------------------------
   lfdi_2000 |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
    lagdlfdi |   .1631662   .0180937     9.02   0.000     .1277032
.1986291
laglnst~2000 |    .830845   .0196843    42.21   0.000     .7922645
.8694255
 lagtradegdp |    .011992   .0020779     5.77   0.000     .0079195
.0160645
    lagdlgdp |   .2558113   .0424486     6.03   0.000     .1726136
.3390091
       _cons |   16.77702   .1556693   107.77   0.000     16.47191
17.08212
-------------+----------------------------------------------------------------
     sigma_u |  .77431228
     sigma_e |  1.0746505
         rho |  .34173973   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.  estimates store RAND, title(The RE) ;

. hausman FIX RAND;

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)
sqrt(diag(V_b-V_B))
             |      FIX          RAND        Difference          S.E.
-------------+----------------------------------------------------------------
    lagdlfdi |    .1564758     .1631662       -.0066903               .
laglnst~2000 |     .762135      .830845         -.06871         .014887
 lagtradegdp |    .0178568      .011992        .0058648        .0015393
    lagdlgdp |    .2601478     .2558113        .0043365        .0052695
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from
xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from
xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       22.94
                Prob>chi2 =      0.0001
                (V_b-V_B is not positive definite)
###End Stata 9.2####

On Mon, May 18, 2009 at 12:26 PM, Steven Archambault
<archstevej at gmail.com>wrote:

> Giovani,
>
> Thank you so much for your comments. I am a bit new to R, and to these
> mailing lists, so I apologize for being sparse on the details and examples.
> I am using Stata 9.2, which might be the answer to my problem, as you
> described. I have done quite a bit of internet searching, and did not read
> anywhere about the use of a different method for calculating the chi-sq
> value, so thanks for that.
>
>  One more issue I have been thinking about. I am assuming your Plm package
> knows that the FE is the consistient model, as the same results arrive if
> the code is phtest(femod, remod) or phtest(remod, femod). The order does
> matter in Stata.
>
> For complteness I am going to post my results using the same Grumfeld
> dataset for both stata 9.2 (by hand calculation and canned procedure) and
> R.  I am using the Plm package version 1 1-2.
>
> Regards,
> Steve
>
>
>
>  ## begin Stata9.2 output##
> xtreg inv value capital, robust re;
>
> Random-effects GLS regression                   Number of obs      =
> 200
> Group variable (i): firmid                      Number of groups   =
> 10
>
> R-sq:  within  = 0.7668                         Obs per group: min =
> 20
>        between = 0.8196                                        avg =
> 20.0
>        overall = 0.8061                                        max =
> 20
>
> Random effects u_i ~ Gaussian                   Wald chi2(3)       =
> 77.70
>
> corr(u_i, X)       = 0 (assumed)                Prob > chi2        =
> 0.0000
>
>
> ------------------------------------------------------------------------------
>              |               Robust
>       invest |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
>
> -------------+----------------------------------------------------------------
>        value |   .1097811   .0197587     5.56   0.000     .0710547
> .1485076
>      capital |    .308113   .0418387     7.36   0.000     .2261107
> .3901153
>        _cons |  -57.83441   24.67795    -2.34   0.019    -106.2023
> -9.466507
>
> -------------+----------------------------------------------------------------
>
>      sigma_u |   84.20095
>      sigma_e |  52.767964
>          rho |  .71800838   (fraction of variance due to u_i)
>
> ------------------------------------------------------------------------------
>
> . matrix bfe=e(b);
>
> . matrix vfe=e(V);
>
> . estimates store remod;
>
> . xtreg inv value capital, robust fe;
>
> Fixed-effects (within) regression               Number of obs      =
> 200
> Group variable (i): firmid                      Number of groups   =
> 10
>
> R-sq:  within  = 0.7668                         Obs per group: min =
> 20
>        between = 0.8194                                        avg =
> 20.0
>        overall = 0.8060                                        max =
> 20
>
>                                                 F(2,188)           =
> 40.23
>
> corr(u_i, Xb)  = -0.1517                        Prob > F           =
> 0.0000
>
> ------------------------------------------------------------------------------
>              |               Robust
>       invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
> Interval]
>
> -------------+----------------------------------------------------------------
>        value |   .1101238    .019378     5.68   0.000     .0718975
> .1483501
>      capital |   .3100653    .042795     7.25   0.000     .2256452
> .3944854
>        _cons |  -58.74393   23.37422    -2.51   0.013    -104.8534
> -12.63449
>
> -------------+----------------------------------------------------------------
>      sigma_u |  85.732501
>      sigma_e |  52.767964
>          rho |  .72525012   (fraction of variance due to u_i)
>
> ------------------------------------------------------------------------------
>
>  ###Hausman by hand###
>
> . estimates store femod;
>
> . matrix vre=e(V);
>
> . matrix bre=e(b);
>
> . matrix bdif=bfe-bre;
>
> . matrix list bdif;
>
> bdif[1,3]
>          value     capital       _cons
> y1  -.00034265  -.00195236   .90952273
>
> . matrix bdifp=bdif';
>
> . matrix dv=vfe-vre;
>
> . matrix dvi=inv(dv);
>
> . matrix list bdif;
>
> bdif[1,3]
>          value     capital       _cons
> y1  -.00034265  -.00195236   .90952273
>
> . matrix list bdifp;
>
> bdifp[3,1]
>                  y1
>   value  -.00034265
> capital  -.00195236
>   _cons   .90952273
>
> . matrix list dvi;
>
> symmetric dvi[3,3]
>               value     capital       _cons
>   value  -7739.3615
> capital   5808.2905   -5305.811
>   _cons   3.6641311   .98569198  -.00051157
>
> . matrix chisq=bdif*dvi*bdifp;
>
> . matrix list chisq;
>
> symmetric chisq[1,1]
>             y1
> y1  -.01956929
> ###Hausman canned###
> .  hausman femod remod;
>
>                  ---- Coefficients ----
>              |      (b)          (B)            (b-B)
> sqrt(diag(V_b-V_B))
>              |     femod        remod        Difference          S.E.
>
> -------------+----------------------------------------------------------------
>        value |    .1101238     .1097811        .0003427               .
>      capital |    .3100653      .308113        .0019524        .0089965
>
> ------------------------------------------------------------------------------
>                            b = consistent under Ho and Ha; obtained from
> xtreg
>             B = inconsistent under Ha, efficient under Ho; obtained from
> xtreg
>
>     Test:  Ho:  difference in coefficients not systematic
>
>                   chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
>                           =    -0.01    chi2<0 ==> model fitted on these
>                                         data fails to meet the asymptotic
>                                         assumptions of the Hausman test;
>                                         see suest for a generalized test ##
> end Stata9.2 output ##
>
> ##begin Output R, using PLM 1.1-2###
>
> > test<-data(Grunfeld, package="Ecdat")
> >
> > fm <- inv~value+capital
> > femod <- plm(fm, Grunfeld, model="within")
> > summary(femod)
> Oneway (individual) effect Within Model
>
> Call:
> plm(formula = fm, data = Grunfeld, model = "within")
>
> Balanced Panel: n=10, T=20, N=200
>
> Residuals :
>     Min.  1st Qu.   Median  3rd Qu.     Max.
> -184.000  -17.600    0.563   19.200  251.000
>
> Coefficients :
>         Estimate Std. Error t-value  Pr(>|t|)
> value   0.110124   0.011857  9.2879 < 2.2e-16 ***
> capital 0.310065   0.017355 17.8666 < 2.2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Total Sum of Squares:    2244400
> Residual Sum of Squares: 523480
> F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16
>
> > remod <- plm(fm, Grunfeld, model="random")
> > summary(remod)
> Oneway (individual) effect Random Effect Model
>    (Swamy-Arora's transformation)
>
> Call:
> plm(formula = fm, data = Grunfeld, model = "random")
>
> Balanced Panel: n=10, T=20, N=200
>
> Effects:
>                    var  std.dev share
> idiosyncratic 2784.458   52.768 0.282
> individual    7089.800   84.201 0.718
> theta:  0.86122
>
> Residuals :
>    Min. 1st Qu.  Median 3rd Qu.    Max.
> -178.00  -19.70    4.69   19.50  253.00
>
> Coefficients :
>               Estimate Std. Error t-value Pr(>|t|)
> (Intercept) -57.834415  28.898935 -2.0013  0.04536 *
> value         0.109781   0.010493 10.4627  < 2e-16 ***
> capital       0.308113   0.017180 17.9339  < 2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Total Sum of Squares:    2381400
> Residual Sum of Squares: 548900
> F-statistic: 328.837 on 2 and 197 DF, p-value: < 2.22e-16
> > phtest(femod, remod)
>
>         Hausman Test
>
> data:  fm
> chisq = 2.3304, df = 2, p-value = 0.3119
> alternative hypothesis: one model is inconsistent
>
> ###end Plm###
>
>
>
>
>
> On Mon, May 18, 2009 at 6:01 AM, Millo Giovanni <
> Giovanni_Millo at generali.com> wrote:
>
>> Dear Steve,
>>
>> I got your inquiry courtesy of Christian Kleiber, who brought it to our
>> attention: please next time you post anything re a given package,
>> include the maintainer's address. We cannot guarantee to parse all the
>> daily digests of the R system!
>>
>> Your problem: can you please provide a reproducible example? Else it is
>> difficult to help, not knowing your data, your results and even the
>> Stata version you're using.
>>
>> In the following I replicate what you might have done on a well-known
>> dataset.
>>
>> From Stata10, on the usual Grunfeld data taken from package "Ecdat":
>>
>> ## begin Stata10 output ##
>> . xtreg inv value capital
>>
>> Random-effects GLS regression                   Number of obs      =
>> 200
>> Group variable: firm                            Number of groups   =
>> 10
>>
>> R-sq:  within  = 0.7668                         Obs per group: min =
>> 20
>>       between = 0.8196                                        avg =
>> 20.0
>>       overall = 0.8061                                        max =
>> 20
>>
>> Random effects u_i ~ Gaussian                   Wald chi2(2)       =
>> 657.67
>> corr(u_i, X)       = 0 (assumed)                Prob > chi2        =
>> 0.0000
>>
>> ------------------------------------------------------------------------
>> ------
>>         inv |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
>> Interval]
>> -------------+----------------------------------------------------------
>> ------
>>       value |   .1097811   .0104927    10.46   0.000     .0892159
>> .1303464
>>     capital |    .308113   .0171805    17.93   0.000     .2744399
>> .3417861
>>       _cons |  -57.83441   28.89893    -2.00   0.045    -114.4753
>> -1.193537
>> -------------+----------------------------------------------------------
>> ------
>>     sigma_u |   84.20095
>>     sigma_e |  52.767964
>>         rho |  .71800838   (fraction of variance due to u_i)
>> ------------------------------------------------------------------------
>> ------
>>
>> . estimates store remod
>>
>> . xtreg inv value capital, fe
>>
>> Fixed-effects (within) regression               Number of obs      =
>> 200
>> Group variable: firm                            Number of groups   =
>> 10
>>
>> R-sq:  within  = 0.7668                         Obs per group: min =
>> 20
>>       between = 0.8194                                        avg =
>> 20.0
>>       overall = 0.8060                                        max =
>> 20
>>
>>                                                F(2,188)           =
>> 309.01
>> corr(u_i, Xb)  = -0.1517                        Prob > F           =
>> 0.0000
>>
>> ------------------------------------------------------------------------
>> ------
>>         inv |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> Interval]
>> -------------+----------------------------------------------------------
>> ------
>>       value |   .1101238   .0118567     9.29   0.000     .0867345
>> .1335131
>>     capital |   .3100653   .0173545    17.87   0.000     .2758308
>> .3442999
>>       _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086
>> -34.177
>> -------------+----------------------------------------------------------
>> ------
>>     sigma_u |  85.732501
>>     sigma_e |  52.767964
>>         rho |  .72525012   (fraction of variance due to u_i)
>> ------------------------------------------------------------------------
>> ------
>> F test that all u_i=0:     F(9, 188) =    49.18              Prob > F =
>> 0.0000
>>
>> . estimates store femod
>>
>> . hausman femod remod
>>
>>                 ---- Coefficients ----
>>             |      (b)          (B)            (b-B)
>> sqrt(diag(V_b-V_B))
>>             |     femod        remod        Difference          S.E.
>> -------------+----------------------------------------------------------
>> ------
>>       value |    .1101238     .1097811        .0003427        .0055213
>>     capital |    .3100653      .308113        .0019524        .0024516
>> ------------------------------------------------------------------------
>> ------
>>                           b = consistent under Ho and Ha; obtained from
>> xtreg
>>            B = inconsistent under Ha, efficient under Ho; obtained from
>> xtreg
>>
>>    Test:  Ho:  difference in coefficients not systematic
>>
>>                  chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
>>                          =        2.33
>>                Prob>chi2 =      0.3119
>>
>> .
>> ## end Stata10 output ##
>>
>> while from plm I get
>>
>> ## begin R putput ##
>> > data(Grunfeld, package="Ecdat")
>> > fm <- inv~value+capital
>> >
>> > femod <- plm(fm, Grunfeld)
>> > remod <- plm(fm, Grunfeld, model="random")
>> >
>> > phtest(femod, remod)
>>
>>        Hausman Test
>>
>> data:  fm
>> chisq = 2.3304, df = 2, p-value = 0.3119
>> alternative hypothesis: one model is inconsistent
>>
>> ## end R output ##
>>
>> which, besides testifying to the goodness and parsimony of an
>> object-oriented approach as far as screen output is concerned, looks
>> rather consistent to me.
>>
>> I cannot but guess that the problem might stem from different RE
>> estimates: previous versions of Stata used the Wallace-Hussein method by
>> default for computing the variance of random effects. Now Stata uses
>> Swamy-Arora, which has been the default of 'plm' since the beginning.
>> Yet as plm() allows to choose, you can experiment with different values
>> for the 'random.method' argument in order to see if you get the Stata
>> result. I suggest you start by comparing the coefficient estimates you
>> get from Stata and R: FE should be unambiguous, RE might vary as said
>> above, and for good reason.
>>
>> You also didn't tell us whether your by-hand calculation agrees with
>> phtest() output? (I guess it does not)
>>
>> Please let us know, possibly with a reproducible example and providing
>> all the above info
>> Giovanni
>>
>> PS please also make sure you're not using any VEEEEERY old version of
>> 'plm' (prior to, say, 0.3): these had a bug in the p-value calculation
>> which made it depend on the order of models compared (so that in the
>> wrong case you got p.value=1).
>>
>> Giovanni Millo
>> Research Dept.,
>> Assicurazioni Generali SpA
>> Via Machiavelli 4,
>> 34132 Trieste (Italy)
>> tel. +39 040 671184
>> fax  +39 040 671160
>>
>> > ----------------------------------------------------------------------
>> > --
>> >
>> > Subject:
>> > [R-SIG-Finance] Chi-sq Hausman test---R vs Stata
>> > From:
>> > Steven Archambault <archstevej at gmail.com>
>> > Date:
>> > Sun, 17 May 2009 23:14:13 -0600
>> > To:
>> > r-sig-finance at stat.math.ethz.ch
>> >
>> > To:
>> > r-sig-finance at stat.math.ethz.ch
>> >
>> >
>> > Hi all,
>> >
>> > I am running a panel time series regression testing Fixed Effects and
>> > Random Effects. I decided to calculate the chi-sq value for the
>> > Hausman test in both R (Phtest) and Stata. I get different results.
>> > Even within Stata, calculating the Chi-sq value with the canned
>> > procedure or by hand (using
>> > matrices) gives different results. So, the question should come up
>> there as
>> > well.
>> >
>> > Does anybody have any insight on how to pick which results to use? I
>> > guess the one that gives the result I want? Having different programs
>> > give quite different values for the same tests is frustrating me.  I'd
>>
>> > be interested in any feedback folks have!
>> >
>> > Thanks,
>> > Steve
>> >
>> >       [[alternative HTML version deleted]]
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20090518/c6b65272/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fdi_data.csv
Type: application/vnd.ms-excel
Size: 174320 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20090518/c6b65272/attachment.xlb>