[R] polr probit versus stata oprobit

Jean Eid jeaneid at chass.utoronto.ca
Thu Nov 11 19:10:34 CET 2004


Thank you Thomas for your answer. It was the weights that are giving me
problems and I still have no idea why. i.e. when I try your example,
everything work fine. However when I do not include the weights=Freq and
[fw=Freq] in both softwares, I do get verry different results.


Jean,

On Thu, 11 Nov 2004, Thomas Lumley wrote:

> On Wed, 10 Nov 2004, Jean Eid wrote:
>
> > Dear Thomas,
> >
> > Where you also able to replicate the second example?  (the exaample
> > that I turned the housing data into numerical variables) That is the one
> > that my estimates differ.
> >
>
> I don't have your second example, but I get the same results from
>     polr(formula = Sat ~ as.numeric(Infl) + as.numeric(Type) +
>       as.numeric(Cont), data = housing, weights = Freq, method = "probit")
> and
>     oprobit Sat Infl Type Cont [fw=Freq]
> for example.
>
>  	-thomas
>
>
> >
> > On Wed, 10 Nov 2004, Thomas Lumley wrote:
> >
> >> On Wed, 10 Nov 2004, Jean Eid wrote:
> >>
> >>> Dear All,
> >>> I have been struggling to understand why for the housing data in MASS
> >>> library R and stata give coef. estimates that are really different. I also
> >>> tried to come up with many many examples myself (see below, of course I
> >>> did not have the set.seed command included) and all of my
> >>> `random' examples seem to give verry similar output. For the housing data,
> >>> I have changed the data into numeric vectors instead of factors/ordered
> >>> factors. I did so to try and get the same results as in STATA and to have
> >>> the housing example as close as possible to the one I constructed.
> >>>
> >>> I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS
> >>> version  7.2-8.
> >>>
> >>>
> >>> here's the example ( I assume that you have STATA installed and can run in
> >>> batch mode, if not the output is also given below)
> >>>
> >>
> >> That example shows the same results with Stata and polr() from MASS.
> >>
> >> For the housing data, I also get the same coefficients in Stata as with
> >> polr():
> >>
> >> In R:
> >> library(MASS)
> >> library(foreign)
> >> write.dta(housing, file="housing.dta")
> >> house.probit<-polr(Sat ~ Infl + Type + Cont, data = housing, weights =
> >> Freq, method = "probit")
> >> summary(house.probit)
> >> -------------------------
> >> Re-fitting to get Hessian
> >>
> >> Call:
> >> polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq,
> >>      method = "probit")
> >>
> >> Coefficients:
> >>                     Value Std. Error   t value
> >> InflMedium     0.3464233 0.06413706  5.401297
> >> InflHigh       0.7829149 0.07642620 10.244063
> >> TypeApartment -0.3475372 0.07229093 -4.807480
> >> TypeAtrium    -0.2178874 0.09476607 -2.299213
> >> TypeTerrace   -0.6641737 0.09180004 -7.235005
> >> ContHigh       0.2223862 0.05812267  3.826153
> >>
> >> Intercepts:
> >>              Value   Std. Error t value
> >> Low|Medium  -0.2998  0.0762    -3.9371
> >> Medium|High  0.4267  0.0764     5.5850
> >>
> >> Residual Deviance: 3479.689
> >> AIC: 3495.689
> >> ------------------------
> >>
> >>
> >> In Stata
> >> -----------------
> >> . use housing.dta
> >> . xi: oprobit Sat i.Infl i.Type i.Cont [fw=Freq]
> >> i.Infl            _IInfl_1-3          (naturally coded; _IInfl_1 omitted)
> >> i.Type            _IType_1-4          (naturally coded; _IType_1 omitted)
> >> i.Cont            _ICont_1-2          (naturally coded; _ICont_1 omitted)
> >>
> >> Iteration 0:   log likelihood = -1824.4388
> >> Iteration 1:   log likelihood = -1739.9254
> >> Iteration 2:   log likelihood = -1739.8444
> >>
> >> Ordered probit estimates                          Number of obs   =    1681
> >>                                                    LR chi2(6)      =   169.19
> >>                                                    Prob > chi2     =   0.0000
> >> Log likelihood = -1739.8444                       Pseudo R2       =   0.0464
> >>
> >> ------------------------------------------------------------------------------
> >>           Sat |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> >> Interval]
> >> -------------+----------------------------------------------------------------
> >>      _IInfl_2 |   .3464228    .064137     5.40   0.000     .2207165     .472129
> >>      _IInfl_3 |   .7829146    .076426    10.24   0.000     .6331224    .9327069
> >>      _IType_2 |  -.3475367   .0722908    -4.81   0.000    -.4892241   -.2058493
> >>      _IType_3 |  -.2178875    .094766    -2.30   0.021    -.4036254   -.0321497
> >>      _IType_4 |  -.6641735   .0917999    -7.24   0.000     -.844098    -.484249
> >>      _ICont_2 |   .2223858   .0581226     3.83   0.000     .1084676     .336304
> >> -------------+----------------------------------------------------------------
> >>         _cut1 |  -.2998279   .0761537          (Ancillary parameters)
> >>         _cut2 |   .4267208   .0764043
> >> ------------------------------------------------------------------------------
> >>
> >>
> >>
> >>  	-thomas
> >>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
>




More information about the R-help mailing list