[R-sig-ME] comparing posterior means

Wed Oct 15 21:05:16 CEST 2014

Dear Douglas, Harold and Emmanuel,

The rather technical discussion is difficult to follow for me and I 
still feel unsure about what the right answer is to my question. To test 
the difference between the two posterior means, does it make sense to 
test for the difference of the two random effects , u0A-u0B, and use the 
(co)variances of both to test H0: u0A - u0B = 0? Or should the variance 
of the fixed effect b0 also be taken into account (which I now believe 
is not needed)? Please help me. Humble regards,

Ben.

On 15-10-2014 20:16, Douglas Bates wrote:
> On Wed, Oct 15, 2014 at 12:24 PM, Doran, Harold <HDoran at air.org 
> <mailto:HDoran at air.org>> wrote:
>
>     Doug:
>
>     I grab the variance/covariance matrix of the random effects in a
>     way that I think will make you cringe, but will share it here and
>     am interested in learning how it can be done more efficiently.
>     Keep in mind basically two principles. First, my lme.eiv function
>     (which stands for linear mixed model error-in-variables) uses
>     henderson's equations as typically described and then stands on
>     your shoulders and uses almost all of the functions in the Matrix
>     package for sparse matrices.
>
>     BTW, though my function is not available in a package, I am happy
>     to share it with you. It has complete technical documentation and
>     is written using S3 methods with various common extractor
>     functions typically used (and more). The function is intended to
>     be used when the variables on the RHS are measured with error.
>     Otherwise, my function simply matches lmer's output.
>
>     So, let me use the following to represent the linear model as
>
>     Xb = y
>
>     Assume X is the leftmost matrix in henderson's equation (so it is
>     a big 2x2 blocked matrix), b is a vector holding both the fixed
>     and random effects, and y is the outcome. The matrix X is big (and
>     very sparse in my applications) and so I find its Cholesky
>     decomposition and then simply solve the triangular systems until a
>     solution is reached.
>
>
> I think I am missing a couple of steps here. Henderson's mixed model 
> equations, as described, say, in the Wikipedia entry, are a set of 
> penalized normal equations.  That is, they look like
>
> X'X b = X'y
>
> but with a block structure and with a penalty on the random effects 
> crossproduct matrix.  In the Wikipedia article the lower right block 
> is written as Z'R^{-1}Z + G^{-1} and I would describe G^{-1} as the 
> penalty term in that it penalizes large values of the coefficients b.
>
> In lmer a similar penalized least squares (PLS) problem is solved for 
> each evaluation of the profiled log-likelihood or profiled REML 
> criterion.  The difference is that instead of solving for the 
> conditional means of the random effects on the original scale we solve 
> for the conditional means of the "spherical" random effects. Also, the 
> order of the coefficients in the PLS problem is switched so that the 
> random effects come before the fixed effects. Doing things this way 
> allows for evaluation of the evaluation of the profiled log-likelihood 
> from the determinant of the sparse Cholesky factor and the penalized 
> residual sum-of-squares (equations 34 and 41, for REML) in the paper 
> on arxiv.org <http://arxiv.org>.
>
>     After a solution is reached (here is where you will cringe), I
>     find the inverse of the big matrix X as
>
>     mat1.inv <- solve(L, I)
>
>     where L is the Cholesky factor of X and I is a conformable
>     identity matrix. This, in essence, finds the inverse of the big
>     matrix X, and then I can grab the variances and covariances of
>     everything I need from this after the solution is reached.
>
>
> You're right.  I did cringe.  At the risk of sounding like a broken 
> record you really don't need to calculate the inverse of that large, 
> sparse matrix to get the information you want.  At most you have to 
> solve block by block.
>
> I'm currently doing something similar in the Julia implementation.  
> The aforementioned arxiv.org <http://arxiv.org> paper gives 
> expressions for the gradient of the profiled log-likelihood.  It can 
> be a big win to evaluate the analytic gradient when optimizing the 
> criterion.  The only problem is that you need to do nearly as much 
> work to evaluate the gradient as to invert the big matrix.  Many years 
> ago when Saikat and I derived those expressions I thought it was great 
> and coded up the optimization using that.  I fit a model to a large, 
> complex data set using only the function evaluation, which took a 
> couple of hours. Then I started it again using the gradient 
> evaluation, eagerly anticipating how fast it would be.  I watched and 
> watched and eventually went to bed because it was taking so long.  The 
> next morning I got up to find that it had completed one iteration in 
> twelve hours.
>
> It is possible to do that evaluation for certain model types, but the 
> general evaluation by just pushing it into a sparse matrix calculation 
> can be formidable.  The Julia implementation does use the gradient but 
> only for models with nested grouping factors (and that is not yet 
> complete) or for models with two crossed or nearly crossed grouping 
> factors.
>
> So in terms of the original question, it would be reasonable to 
> evaluate the conditional covariance of the random effects under 
> similar designs - either nested grouping factors, which includes, as a 
> trivial case, a single grouping factor, or crossed grouping factors.  
> In the latter case, however, the covariance matrix will be large and 
> dense so you may question whether it is really important to evaluate 
> it.  I can make sense of 2 by 2 covariance matrices and maybe 3 by 3 
> but 85 by 85 dense covariance matrices are difficult for me to interpret.
>
>
>     *From:*dmbates at gmail.com <mailto:dmbates at gmail.com>
>     [mailto:dmbates at gmail.com <mailto:dmbates at gmail.com>] *On Behalf
>     Of *Douglas Bates
>     *Sent:* Wednesday, October 15, 2014 12:37 PM
>     *To:* Doran, Harold
>     *Cc:* Ben Pelzer; r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     *Subject:* Re: [R-sig-ME] comparing posterior means
>
>     On Wed, Oct 15, 2014 at 9:30 AM, Doran, Harold <HDoran at air.org
>     <mailto:HDoran at air.org>> wrote:
>
>     Ben
>
>     Yes, you can do this comparison of the conditional means using the
>     variance of the linear combination AND there is in fact a
>     covariance term between them. I do not believe that covariance
>     term between BLUPs is available in lmer (I wrote my own mixed
>     model function that does spit this out, however).
>
>     Just to be didactic for a moment. Take a look at Henderson's
>     equation(say at the link below)
>
>     http://en.wikipedia.org/wiki/Mixed_model
>
>     The covariance term between the blups that you would need comes
>     from the lower right block of the leftmost matrix at the final
>     solution. Lmer is not parameterized this way, so the comparison is
>     not intended to show how that term would be extracted from lmer.
>     Only to show that is does exist in the likelihood and can
>     (conceivably) be extracted or computed from the terms given by lmer.
>
>     I would disagree, Harold, about the relationship between the
>     formulation used in lmer and that in Henderson's mixed model
>     equations.  There is a strong relationship, which is explicitly
>     shown in http://arxiv.org/abs/1406.5823
>
>     Also shown there is why the modifications from Henderson's
>     formulation to that in lmer lead to flexibility in model
>     formulation and much greater speed and stability in fitting such
>     models.  Reversing the positions of the random effects and fixed
>     effects in the penalized least squares problem and using a
>     relative covariance factor instead of the covariance matrix allows
>     for the profiled log-likelihood or profiled REML criterion to be
>     evaluated. Furthermore, they allow for the sparse Cholesky
>     decomposition to be used effectively.  (Henderson's formulation
>     does do as good a job of preserving sparsity.)
>
>     I believe you want the conditional variance-covariance matrix for
>     the random effects given the observed data and the parameter
>     values. The sparse Cholesky factor L is the Cholesky factor of
>     that variance-covariance, up to the scale factor.  It is, in fact
>     more stable to work with the factor L than to try to evaluate the
>     variance-covariance matrix itself.
>
>     I'm happy to flesh this out in private correspondence if you wish.
>
>
>
>
>         -----Original Message-----
>         From: r-sig-mixed-models-bounces at r-project.org
>         <mailto:r-sig-mixed-models-bounces at r-project.org>
>         [mailto:r-sig-mixed-models-bounces at r-project.org
>         <mailto:r-sig-mixed-models-bounces at r-project.org>] On Behalf
>         Of Ben Pelzer
>         Sent: Wednesday, October 15, 2014 8:56 AM
>         To: r-sig-mixed-models at r-project.org
>         <mailto:r-sig-mixed-models at r-project.org>
>         Subject: [R-sig-ME] comparing posterior means
>
>         Dear list,
>
>         Suppose we have the following two-level null-model, for data
>         from respondents (lowest level 1) living in countries (highest
>         level 2):
>
>         Y(ij) = b0j + eij = (b0 + u0j)  + eij
>
>         b0j is the country-mean for country j
>         b0 is the "grand mean"
>         u0j is the deviation from the grand mean for country j, or the
>         level-2 residual eij is the level-1 residual
>
>         The model is estimated by :  lmer(Y ~ 1+(1|country))
>
>         My question is about comparing two particular posterior
>         country-means.
>         As for as I know, for a given country j, the posterior mean is
>         equal to
>         bb0 + uu0j, where bb0 is the estimate of b0 and uu0j is the
>         posterior residual estimate of u0j.
>
>         Two compare two particular posterior country means and test
>         whether they differ significantly, would it be necessary to
>         know the variance of
>         bb0+uu0j for each of the two countries, or would it be
>         sufficient to
>         only know the variance of uu0j?
>
>         The latter variance (of uu0j) can be extracted using
>
>         rr <- ranef(modela, condVar=TRUE)
>         attr(rr[[1]], "postVar")
>
>         However, the variance of bb0+uu0j also depends on the variance
>         of bb0
>         and the covariance of bb0 and uu0j (if this covariance is not
>         equal to
>         zero, of course, which I don't know...).
>
>         On the other hand, the difference between two posterior
>         country means
>         for country A and B say, would
>         equal bb0 + u0A -(bb0 + u0B) = u0A - u0B meaning that I
>         wouldn't need to
>         worry about the variance of bb0.
>
>         So my main question is about comparing and testing the difference
>         between two posterior country means. Thanks for any help,
>
>         Ben.
>
>         _______________________________________________
>         R-sig-mixed-models at r-project.org
>         <mailto:R-sig-mixed-models at r-project.org> mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>         _______________________________________________
>         R-sig-mixed-models at r-project.org
>         <mailto:R-sig-mixed-models at r-project.org> mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]