[R-sig-ME] Difference Score with Random Effects?
Ken Beath
ken at kjbeath.com.au
Tue Apr 21 03:52:02 CEST 2009
Assuming that the model for the Yit has a random intercept then
subtracting them will cancel out the random effect and it's estimate will
be zero.
Adding the Yit to the right hand side of the equation will put the random
component back into the equation, but this could just as easily be shifted
to the left hand side, which the Ui will attempt to model.
Ken
On Mon, April 20, 2009 11:40 pm, Jeremy Pais wrote:
> In estimating a null/intercept only model using longitudinal panel data in
> person-year format (i.e., long, occasions, repeated measures) and a
> difference score as the dependent variable
>
>
>
> (Yit - Yit-1) = a + Ui + Eit
>
>
>
> the model fails to converge using statas xtmixed (flat and continuous
> region) and using lmer in R the between individual variation (Ui) is
> essentially zero. When I include Yit-1 as an independent variable:
>
>
>
> (Yit - Yit-1) = a + Yit-1 + Ui + Eit
>
>
>
> the model converges just fine and reports what seem to be reasonable
> results
> (and the results in Stata and in R correspond). Below is an example using
> Stata. The dependent variable is the difference in family income from t-1
> to
> t in thousands of dollars and is grand mean centered.
>
>
>
>
>
> *. xtsum dinc*
>
>
>
> Variable | Mean Std. Dev. Min Max |
> Observations
>
> -----------------+--------------------------------------------+----------------
>
> dinc overall | -.1507094 47.16423 -2447.117 2749.135 | N =
> 68075
>
> between | 24.94159 -675.392 638.0056 | n =
> 14869
>
> within | 44.31102 -2029.263 2437.635 | T-bar =
> 4.57832
>
>
>
> * *
>
> *. xtmixed dinc ||id: *
>
>
>
> Performing EM optimization:
>
>
>
> Performing gradient-based optimization:
>
>
>
> Iteration 0: log restricted-likelihood = -359298.64
>
> numerical derivatives are approximate
>
> flat or discontinuous region encountered
>
> Iteration 1: log restricted-likelihood = -358930.79
>
> numerical derivatives are approximate
>
> flat or discontinuous region encountered
.
>
>
>
>
>
> *. xtmixed dinc laginc ||id: *
>
>
>
> Performing EM optimization:
>
>
>
> Performing gradient-based optimization:
>
>
>
> Iteration 0: log restricted-likelihood = -353006.12
>
> Iteration 1: log restricted-likelihood = -353004.36
>
> Iteration 2: log restricted-likelihood = -353004.35
>
>
>
> Computing standard errors:
>
>
>
> Mixed-effects REML regression Number of obs =
> 68075
>
> Group variable: id Number of groups =
> 14869
>
>
>
> Obs per group: min =
> 1
>
> avg =
> 4.6
>
> max =
> 11
>
>
>
>
>
> Wald chi2(1) =
> 19885.84
>
> Log restricted-likelihood = -353004.35 Prob > chi2 =
> 0.0000
>
>
>
> ------------------------------------------------------------------------------
>
> dinc | Coef. Std. Err. z P>|z| [95% Conf.
> Interval]
>
> -------------+----------------------------------------------------------------
>
> laginc | -.4772225 .0033841 -141.02 0.000 -.4838553
> -.4705897
>
> _cons | -1.546629 .2300348 -6.72 0.000 -1.997489
> -1.095769
>
> ------------------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------------------------
>
> Random-effects Parameters | Estimate Std. Err. [95% Conf.
> Interval]
>
> -----------------------------+------------------------------------------------
>
> id: Identity |
>
> sd(_cons) | 17.95496 .2854801 17.40406
> 18.5233
>
> -----------------------------+------------------------------------------------
>
> sd(Residual) | 40.56819 .1264724 40.32107
> 40.81683
>
> ------------------------------------------------------------------------------
>
> LR test vs. linear regression: chibar2(01) = 1940.19 Prob >= chibar2 =
> 0.0000
>
>
>
>
>
> Ive also replicated this pattern using other data (still in person-year
> format) and other dependent variables.
>
>
>
> I must be missing something obvious here. Does taking the first difference
> in a null model essentially condition out the between individual variation
> (like a fixed-effects estimator Xit-Xibar, even though there are more than
> two occasions with repeated measures)? And why then does the inclusion of
> a
> lagged variable seem to resolve the issue? Or does it?
>
>
>
> Any insight or suggested literature on this would be greatly appreciated.
>
>
>
> Jeremy
>
>
> --
> Jeremy Pais
> Doctoral Student
> Department of Sociology
> University at Albany, SUNY
>
> jeremy.pais01 at albany.edu
> jpais.albany at gmail.com
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list