[R-sig-ME] [R] coef se in lme

Fri Oct 19 00:03:05 CEST 2007

On 10/18/07, dave fournier <otter at otter-rsch.com> wrote:

> Here is one approach to this problem.

> In the AD Model Builder Random Effects package we provide estimated
> standard deviations for any function of the fixed and random effects,
> (here I include the parameters which detemine the covarince matrices if
> present) and the random effects. This is for general nonlinear random
> effects models, but the calculations can be used for linear models as
> well. We calculate these estimates as follows. Let L(x,u)
> be the log-likelihood function for the parameters x and u given the
> observed data,
> where u is the vector of random effects and x is the vector of the other
> parameters.

I know it may sound pedantic but I don't know what a log-likelihood
L(x,u) would be because you are treating parameters and the random
effects as if they are the same type of object and they're not.  If
you want to use a Bayesian approach you can kind of level the playing
field and say that everything is a parameter except for the observed
data values.  However, Bayesians also need to come up with a prior and
that isn't trivial in this case, as I tried to indicate in my message
about the mcmcsamp chain taking excursions.

I find I can easily confuse myself in the theory of the maximum
likelihood or REML estimates if I am not careful about the terminology
and the roles of the different coefficients in the linear predictor.
I think I would call that function L(x,u) the conditional density of
the random effects given the data.  The parameters determine the joint
density for the random effects and responses so plugging the observed
values of the responses into this expression yields the conditional
density of the random effects.

> Let F(x) be the log-likelihood for x after the u have been
> integrated out. This integration might be exact or more commonly via the
> Laplace approximation or something else.
> For any x let uhat(x) be the value of u which maximizes L(x,u),

I think that is what I would call the conditional modes of the random
effects.  These depend on the observed responses and the model
parameters.

> and let xhat be the value of x which maximizes F(x).

> The estimate for the covariance matrix for the x is then
> S_xx = inv(F_xx) and the estimated full covariance matrix Sigma for the
> x and u is given by

> S_xx                 S_xx * uhat_x
> (S_xx * uhat_x)' uhat' * S_xx * uhat_x + inv(L_uu)

> where ' denotes transpose _x denotes first derivative wrt x (note that
> uhat is a function of x so that uhat_x makes sense) and _xx _uu denote
> the second derivatives wrt x and u. we then use Sigma and the delta
> method to estimate the standard deviation of any (differentiable)
> function of x and u.

I'm getting a little bit lost here.  In the example you sent based on
Harold's discussion, the dimension of x is 3 and the dimension of u is
10 so Sigma is a 13 by 13 matrix, right?  S_xx is 3 by 3 and L_uu is
10 by 10.  To form the product S_xx*uhat_x I think that uhat_x needs
to be 3 by 10.  Is that right?  (I'm used to writing the Jacobian of a
vector-valued function of a vector the other way around.)

It looks like you are missing a _x in the first term in "uhat' * S_xx * uhat_x".

To evaluate L_uu you need a value of x.  I assume you will use the
parameter estimates. Correct?

Will the parameterization of x affect the result?  If I write the
model in terms of the logarithms of the variances instead of the
variances I will definitely get a different Sigma but will the result
for a linear combination of mu and some of the u's stay invariant?  If
it isn't invariant, how do I choose the parameterization of the
variance components?

Can you give a bit more detail on how you justify mixing derivatives
of the marginal log-likelihood (F) with derivatives of the conditional
density (L).  Do you know that these are on the same scale?  I'm
willing to believe that they are - it is just that I can't see right
off why they should be.