In estimating a null/intercept only model using longitudinal panel data in
person-year format (i.e., long, occasions, repeated measures) and a
difference score as the dependent variable
(Yit - Yit-1) = a + Ui + Eit
the model fails to converge using statas xtmixed (flat and continuous
region) and using lmer in R the between individual variation (Ui) is
essentially zero. When I include Yit-1 as an independent variable:
(Yit - Yit-1) = a + Yit-1 + Ui + Eit
the model converges just fine and reports what seem to be reasonable results
(and the results in Stata and in R correspond). Below is an example using
Stata. The dependent variable is the difference in family income from t-1 to
t in thousands of dollars and is grand mean centered.
*. xtsum dinc*
Variable | Mean Std. Dev. Min Max |
Observations
-----------------+--------------------------------------------+----------------
dinc overall | -.1507094 47.16423 -2447.117 2749.135 | N =
68075
between | 24.94159 -675.392 638.0056 | n =
14869
within | 44.31102 -2029.263 2437.635 | T-bar =
4.57832
* *
*. xtmixed dinc ||id: *
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -359298.64
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 1: log restricted-likelihood = -358930.79
numerical derivatives are approximate
flat or discontinuous region encountered
.
*. xtmixed dinc laginc ||id: *
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -353006.12
Iteration 1: log restricted-likelihood = -353004.36
Iteration 2: log restricted-likelihood = -353004.35
Computing standard errors:
Mixed-effects REML regression Number of obs =
68075
Group variable: id Number of groups =
14869
Obs per group: min =
1
avg =
4.6
max =
11
Wald chi2(1) =
19885.84
Log restricted-likelihood = -353004.35 Prob > chi2 =
0.0000
------------------------------------------------------------------------------
dinc | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
laginc | -.4772225 .0033841 -141.02 0.000 -.4838553
-.4705897
_cons | -1.546629 .2300348 -6.72 0.000 -1.997489
-1.095769
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf.
Interval]
-----------------------------+------------------------------------------------
id: Identity |
sd(_cons) | 17.95496 .2854801 17.40406
18.5233
-----------------------------+------------------------------------------------
sd(Residual) | 40.56819 .1264724 40.32107
40.81683
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 1940.19 Prob >= chibar2 =
0.0000
Ive also replicated this pattern using other data (still in person-year
format) and other dependent variables.
I must be missing something obvious here. Does taking the first difference
in a null model essentially condition out the between individual variation
(like a fixed-effects estimator Xit-Xibar, even though there are more than
two occasions with repeated measures)? And why then does the inclusion of a
lagged variable seem to resolve the issue? Or does it?
Any insight or suggested literature on this would be greatly appreciated.
Jeremy
--
Jeremy Pais
Doctoral Student
Department of Sociology
University at Albany, SUNY
jeremy.pais01@albany.edu
jpais.albany@gmail.com
[[alternative HTML version deleted]]