[R-sig-ME] comparing posterior means
Douglas Bates
bates at stat.wisc.edu
Wed Oct 15 20:16:28 CEST 2014
On Wed, Oct 15, 2014 at 12:24 PM, Doran, Harold <HDoran at air.org> wrote:
> Doug:
>
>
>
> I grab the variance/covariance matrix of the random effects in a way that
> I think will make you cringe, but will share it here and am interested in
> learning how it can be done more efficiently. Keep in mind basically two
> principles. First, my lme.eiv function (which stands for linear mixed model
> error-in-variables) uses henderson’s equations as typically described and
> then stands on your shoulders and uses almost all of the functions in the
> Matrix package for sparse matrices.
>
>
>
> BTW, though my function is not available in a package, I am happy to share
> it with you. It has complete technical documentation and is written using
> S3 methods with various common extractor functions typically used (and
> more). The function is intended to be used when the variables on the RHS
> are measured with error. Otherwise, my function simply matches lmer’s
> output.
>
>
>
> So, let me use the following to represent the linear model as
>
>
>
> Xb = y
>
>
>
> Assume X is the leftmost matrix in henderson’s equation (so it is a big
> 2x2 blocked matrix), b is a vector holding both the fixed and random
> effects, and y is the outcome. The matrix X is big (and very sparse in my
> applications) and so I find its Cholesky decomposition and then simply
> solve the triangular systems until a solution is reached.
>
I think I am missing a couple of steps here. Henderson's mixed model
equations, as described, say, in the Wikipedia entry, are a set of
penalized normal equations. That is, they look like
X'X b = X'y
but with a block structure and with a penalty on the random effects
crossproduct matrix. In the Wikipedia article the lower right block is
written as Z'R^{-1}Z + G^{-1} and I would describe G^{-1} as the penalty
term in that it penalizes large values of the coefficients b.
In lmer a similar penalized least squares (PLS) problem is solved for each
evaluation of the profiled log-likelihood or profiled REML criterion. The
difference is that instead of solving for the conditional means of the
random effects on the original scale we solve for the conditional means of
the "spherical" random effects. Also, the order of the coefficients in the
PLS problem is switched so that the random effects come before the fixed
effects. Doing things this way allows for evaluation of the evaluation of
the profiled log-likelihood from the determinant of the sparse Cholesky
factor and the penalized residual sum-of-squares (equations 34 and 41, for
REML) in the paper on arxiv.org.
>
>
> After a solution is reached (here is where you will cringe), I find the
> inverse of the big matrix X as
>
>
>
> mat1.inv <- solve(L, I)
>
>
>
> where L is the Cholesky factor of X and I is a conformable identity
> matrix. This, in essence, finds the inverse of the big matrix X, and then I
> can grab the variances and covariances of everything I need from this after
> the solution is reached.
>
You're right. I did cringe. At the risk of sounding like a broken record
you really don't need to calculate the inverse of that large, sparse matrix
to get the information you want. At most you have to solve block by block.
I'm currently doing something similar in the Julia implementation. The
aforementioned arxiv.org paper gives expressions for the gradient of the
profiled log-likelihood. It can be a big win to evaluate the analytic
gradient when optimizing the criterion. The only problem is that you need
to do nearly as much work to evaluate the gradient as to invert the big
matrix. Many years ago when Saikat and I derived those expressions I
thought it was great and coded up the optimization using that. I fit a
model to a large, complex data set using only the function evaluation,
which took a couple of hours. Then I started it again using the gradient
evaluation, eagerly anticipating how fast it would be. I watched and
watched and eventually went to bed because it was taking so long. The next
morning I got up to find that it had completed one iteration in twelve
hours.
It is possible to do that evaluation for certain model types, but the
general evaluation by just pushing it into a sparse matrix calculation can
be formidable. The Julia implementation does use the gradient but only for
models with nested grouping factors (and that is not yet complete) or for
models with two crossed or nearly crossed grouping factors.
So in terms of the original question, it would be reasonable to evaluate
the conditional covariance of the random effects under similar designs -
either nested grouping factors, which includes, as a trivial case, a single
grouping factor, or crossed grouping factors. In the latter case, however,
the covariance matrix will be large and dense so you may question whether
it is really important to evaluate it. I can make sense of 2 by 2
covariance matrices and maybe 3 by 3 but 85 by 85 dense covariance matrices
are difficult for me to interpret.
> *From:* dmbates at gmail.com [mailto:dmbates at gmail.com] *On Behalf Of *Douglas
> Bates
> *Sent:* Wednesday, October 15, 2014 12:37 PM
> *To:* Doran, Harold
> *Cc:* Ben Pelzer; r-sig-mixed-models at r-project.org
> *Subject:* Re: [R-sig-ME] comparing posterior means
>
>
>
> On Wed, Oct 15, 2014 at 9:30 AM, Doran, Harold <HDoran at air.org> wrote:
>
> Ben
>
> Yes, you can do this comparison of the conditional means using the
> variance of the linear combination AND there is in fact a covariance term
> between them. I do not believe that covariance term between BLUPs is
> available in lmer (I wrote my own mixed model function that does spit this
> out, however).
>
> Just to be didactic for a moment. Take a look at Henderson's equation(say
> at the link below)
>
> http://en.wikipedia.org/wiki/Mixed_model
>
> The covariance term between the blups that you would need comes from the
> lower right block of the leftmost matrix at the final solution. Lmer is not
> parameterized this way, so the comparison is not intended to show how that
> term would be extracted from lmer. Only to show that is does exist in the
> likelihood and can (conceivably) be extracted or computed from the terms
> given by lmer.
>
>
>
> I would disagree, Harold, about the relationship between the formulation
> used in lmer and that in Henderson's mixed model equations. There is a
> strong relationship, which is explicitly shown in
> http://arxiv.org/abs/1406.5823
>
>
>
> Also shown there is why the modifications from Henderson's formulation to
> that in lmer lead to flexibility in model formulation and much greater
> speed and stability in fitting such models. Reversing the positions of the
> random effects and fixed effects in the penalized least squares problem and
> using a relative covariance factor instead of the covariance matrix allows
> for the profiled log-likelihood or profiled REML criterion to be evaluated.
> Furthermore, they allow for the sparse Cholesky decomposition to be used
> effectively. (Henderson's formulation does do as good a job of preserving
> sparsity.)
>
>
>
> I believe you want the conditional variance-covariance matrix for the
> random effects given the observed data and the parameter values. The
> sparse Cholesky factor L is the Cholesky factor of that
> variance-covariance, up to the scale factor. It is, in fact more stable to
> work with the factor L than to try to evaluate the variance-covariance
> matrix itself.
>
>
>
> I'm happy to flesh this out in private correspondence if you wish.
>
>
>
>
> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org [mailto:
> r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Pelzer
> Sent: Wednesday, October 15, 2014 8:56 AM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] comparing posterior means
>
> Dear list,
>
> Suppose we have the following two-level null-model, for data from
> respondents (lowest level 1) living in countries (highest level 2):
>
> Y(ij) = b0j + eij = (b0 + u0j) + eij
>
> b0j is the country-mean for country j
> b0 is the "grand mean"
> u0j is the deviation from the grand mean for country j, or the level-2
> residual eij is the level-1 residual
>
> The model is estimated by : lmer(Y ~ 1+(1|country))
>
> My question is about comparing two particular posterior country-means.
> As for as I know, for a given country j, the posterior mean is equal to
> bb0 + uu0j, where bb0 is the estimate of b0 and uu0j is the posterior
> residual estimate of u0j.
>
> Two compare two particular posterior country means and test whether they
> differ significantly, would it be necessary to know the variance of
> bb0+uu0j for each of the two countries, or would it be sufficient to
> only know the variance of uu0j?
>
> The latter variance (of uu0j) can be extracted using
>
> rr <- ranef(modela, condVar=TRUE)
> attr(rr[[1]], "postVar")
>
> However, the variance of bb0+uu0j also depends on the variance of bb0
> and the covariance of bb0 and uu0j (if this covariance is not equal to
> zero, of course, which I don't know...).
>
> On the other hand, the difference between two posterior country means
> for country A and B say, would
> equal bb0 + u0A -(bb0 + u0B) = u0A - u0B meaning that I wouldn't need to
> worry about the variance of bb0.
>
> So my main question is about comparing and testing the difference
> between two posterior country means. Thanks for any help,
>
> Ben.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list