[R-sig-ME] Formula df when combining imputed data
Ben Pelzer
b.pelzer at maw.ru.nl
Thu Jun 1 18:24:16 CEST 2017
Sorry list, but I forgot to mention that the 10 variables where actually
10 imputed versions of the same underlying variable, which has missings.
Therefore, Rubin's method of combining regression results comes into the
picture.
Ben Pelzer.
On 1-6-2017 18:20, Ben Pelzer wrote:
> Dear list,
>
> In a given dataset, I have 10 dichotomous variables, the missing values
> of which were substituted by multiple imputation techniques. For each
> variable, a glmmPQL model was estimated. The model has a random
> intercept across countries and schools-within-countries. For one of the
> 10 variables the syntax is:
>
> themodel <- glmmPQL( yvariable ~
> 1+Gender+AGE+migrant+rep+missing_rep+Schoolsize+Schoolmaterials+GDP,
> random = list(country = ~ 1, CNTSCHID = ~ 1),
> family=binomial, data=pisas)
>
> The results show:
>
> Value Std.Error DF t-value p-value
> (Intercept) -6.221943 0.9882684 15501 -6.295802 0.0000
> Gender -0.493744 0.0363663 15501 -13.576956 0.0000
> AGE 0.177124 0.0612163 15501 2.893407 0.0038
> migrant -0.311810 0.0867452 15501 -3.594553 0.0003
> rep -2.510684 0.2342525 15501 -10.717855 0.0000
> missing_rep -1.986272 0.3907376 15501 -5.083390 0.0000
> Schoolsize 0.000536 0.0000633 7699 8.470045 0.0000
> Schoolmaterials 0.334300 0.0525469 7699 6.361938 0.0000
> GDP 0.000013 0.0000081 31 1.598743 0.1200
>
>
> I ran this model for each of the 10 imputed variables and then combined
> the results using the method proposed by Rubin, which is also explained
> by Carlin et al. in fmwww.bc.edu/RePEc/bocode/c/carlin.pdf. Equation (2)
> on page 4 shows how to calculate the nr. of df for t-tests for each
> regression coefficient. This is where I got stuck. Evaluating the
> formula for the df leads to the nr.'s of df below:
>
> DF given equation (2)
>
> (Intercept) 19.91120
> Gender 18.75981
> AGE 19.63237
> migrant 21.30057
> rep 28.47710
> missing_rep 133.05131
> Schoolsize 122.45054
> Schoolmaterials 74.71955
> GDP 7231.16666
>
>
> As can be noticed, the df's in the above glmmPQL results are very
> different from those calculated by equation (2) mentioned by Carlin et
> al. in the Stata journal. I realize that the ones in the glmmPQL results
> cannot be entirely correct, due to the fact that the yvariable's
> missings were imputed and next analyzed as though it had no missings at
> all. But I'm wondering also if the df's calculated by equation (2) are
> the "better" ones, because the differences are so large. E.g. for GDP,
> which is a country-level variable, I would expect a low nr. of df's, as
> there are only 33 countries in the data. Could it be that the formula of
> equation (2) for the nr. of df cannot be used here? Or even worse: for a
> logistic model with random country and school effects, the method
> proposed by Rubin for calculating the std. errors of the regression
> coefficients is not really applicable?
>
> Thanks for any advice!!
> Ben Pelzer.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list