[R] Unable to reproduce Stata Heckman sample selection estimates

Yuan Yuan y.yuan at vt.edu
Fri Nov 25 22:37:53 CET 2011


Hi Arne,

I believe I figured out why the Stata coefficient estimates differed 
from R's: in my case, the outcome response variable is binary, so 
the outcome equation is a probit model. From my reading of the 
sampleSelection paper, it seems that the Tobit-2 model has a 
continuous outcome response variable. The Stata command used was 
heckprob, which assumes both the outcome and the selection equations 
are probit models. When I compared the Stata heckman command with 
the R results, I found the estimates were the same.

Sorry for not picking up on that difference earlier.

So it seems that selection() is perhaps not what I'm looking for, 
unless there is a way to specify a probit selection model. Is there 
a package out there that estimates probit models with Heckman sample 
selection? It looks like SemiParBIVProbit might work for me.

 - Clara

On Friday, November 25, 2011 11:05:31 am Yuan Yuan wrote:
> Hi Arne,
> 
> Thanks for the reply.
> 
> I am using R version 2.14.0 and sampleSelection version 0.6.12.
> 
> I estimate the model by the 1-step ML method. However, when I use
> the 2-step method, the standard errors are reported as NA.
> 
> I use the selection() function, very basic call, something to the
> effect of: selection(selectionFormula, outcomeFormula, data =
> aDataFrame), where the formulas are very straightforward and basic
> as well, y ~ x1 + x2 + ... + xp.
> 
> I have read the associated paper, which is where I got the idea to
> pass the coefficients from a seleciton object to the start 
argument.
> 
> I will work on creating a minimal reproducible example; the 
dataset
> is large and confidential, the models long-ish.
> 
>  - Clara
> 
> On Friday, November 25, 2011 04:04:52 am Arne Henningsen wrote:
> > On 25 November 2011 04:37, Yuan Yuan <y.yuan at vt.edu> wrote:
> > > Hello,
> > > 
> > > I am working on reproducing someone's analysis which was done 
in
> > > Stata. The analysis is estimation of a standard Heckman sample
> > > selection model (Tobit-2), for which I am using the
> 
> sampleSelection
> 
> > > package and the selection() function. I have a few problems 
with
> 
> the
> 
> > > estimation:
> > > 
> > > 1) The reported standard error for all estimates is Inf ...
> > > vcov(selectionObject) yields Inf in every cell.
> > > 
> > > 2) While the selection equation coefficient estimates are 
almost
> > > exactly the same as the Stata results, the outcome equation
> > > coefficient estimates are quite different (different sign in 
one
> 
> case,
> 
> > > order of magnitude difference in some other cases).
> > > 
> > > 3) I can't seem to figure out how to specify the initial 
values
> 
> for
> 
> > > the MLE ... whatever argument I pass to start (even of the 
form
> > > coef(selectionObject)), I get the following error:
> > > Error in gr[, fixed] <- NA : (subscript) logical subscript too
> 
> long
> 
> > > I have to admit I am pretty confused by #1, I feel like I must
> 
> be
> 
> > > doing something wrong, missing something obvious, but I have 
no
> 
> idea
> 
> > > what. I figure #2 might be because the algorithms (selection 
and
> > > Stata) are just finding different local maxima, but because of
> 
> #3 I
> 
> > > can't test that guess by using different initial values in
> 
> selection.
> 
> > > Let me know if I should provide any more information. Thanks 
in
> > > advance for any pointers in the right direction.
> > 
> > Yes, please provide more information (see also the posting guide
> 
> [1]),
> 
> > e.g. which version of R and which version of the sampleSelection
> > package are you using? Do you estimate the model by the two-step
> > approach or by the 1-step maximum likelihood method? Which
> 
> commands
> 
> > did use use? Can you send us a reproducible example? Have you 
read
> 
> the
> 
> > paper about using the sampleSelection package [2]?
> > 
> > [1] http://www.r-project.org/posting-guide.html
> > [2] http://www.jstatsoft.org/v27/i07
> > 
> > Best wishes from copenhagen,
> > Arne



More information about the R-help mailing list