[R-sig-ME] [R] coef se in lme

Fri Oct 19 12:10:27 CEST 2007

ouglas Bates wrote:
 > On 10/18/07, dave fournier <otter at otter-rsch.com> wrote:
 >
 >
 >> Here is one approach to this problem.
 >>
 >
 >
 >> In the AD Model Builder Random Effects package we provide estimated
 >> standard deviations for any function of the fixed and random effects,
 >> (here I include the parameters which detemine the covarince matrices if
 >> present) and the random effects. This is for general nonlinear random
 >> effects models, but the calculations can be used for linear models as
 >> well. We calculate these estimates as follows. Let L(x,u)
 >> be the log-likelihood function for the parameters x and u given the
 >> observed data,
 >> where u is the vector of random effects and x is the vector of the other
 >> parameters.
 >>
 >
 > I know it may sound pedantic but I don't know what a log-likelihood
 > L(x,u) would be because you are treating parameters and the random
 > effects as if they are the same type of object and they're not.  If
 > you want to use a Bayesian approach you can kind of level the playing
 > field and say that everything is a parameter except for the observed
 > data values.  However, Bayesians also need to come up with a prior and
 > that isn't trivial in this case, as I tried to indicate in my message
 > about the mcmcsamp chain taking excursions.
 >
 > I find I can easily confuse myself in the theory of the maximum
 > likelihood or REML estimates if I am not careful about the terminology
 > and the roles of the different coefficients in the linear predictor.
 > I think I would call that function L(x,u) the conditional density of
 > the random effects given the data.  The parameters determine the joint
 > density for the random effects and responses so plugging the observed
 > values of the responses into this expression yields the conditional
 > density of the random effects
 >
OK we will call them that.

 >> Let F(x) be the log-likelihood for x after the u have been
 >> integrated out. This integration might be exact or more commonly via the
 >> Laplace approximation or something else.
 >> For any x let uhat(x) be the value of u which maximizes L(x,u),
 >>
 >
 > I think that is what I would call the conditional modes of the random
 > effects.  These depend on the observed responses and the model
 > parameters.
 >

   That's right.
 >
 >> and let xhat be the value of x which maximizes F(x).
 >>
 >
 >
 >> The estimate for the covariance matrix for the x is then
 >> S_xx = inv(F_xx) and the estimated full covariance matrix Sigma for the
 >> x and u is given by
 >>
 >
 >
 >> S_xx                 S_xx * uhat_x
 >> (S_xx * uhat_x)' uhat' * S_xx * uhat_x + inv(L_uu)
 >>
 >
 >
 >> where ' denotes transpose _x denotes first derivative wrt x (note that
 >> uhat is a function of x so that uhat_x makes sense) and _xx _uu denote
 >> the second derivatives wrt x and u. we then use Sigma and the delta
 >> method to estimate the standard deviation of any (differentiable)
 >> function of x and u.
 >>
 >
 > I'm getting a little bit lost here.  In the example you sent based on
 > Harold's discussion, the dimension of x is 3 and the dimension of u is
 > 10 so Sigma is a 13 by 13 matrix,
that is right.

 > right?  S_xx is 3 by 3 and L_uu is
 > 10 by 10.  To form the product S_xx*uhat_x I think that uhat_x needs
 > to be 3 by 10.  Is that right?  (I'm used to writing the Jacobian of a
 > vector-valued function of a vector the other way around.)
 >
Yes it is either 3x10 or 10x3.   Since S_xx is 10 by 10 if one wirtes 
S_xx * uhat_x  uhat_x must be 10 by 3
i.e 10 rows and 3 columns.

 > It looks like you are missing a _x in the first term in "uhat' * S_xx 
* uhat_x".
 >
 >
That is right it should be  uhat_x' * S_xx * uhat_x
 > To evaluate L_uu you need a value of x.  I assume you will use the
 > parameter estimates. Correct?
 >
 >
Yes the derivatives are evaluated at  xhat,uhat.

 > Will the parameterization of x affect the result?
No.    supposte that   we have a reparamseterization  of the x's given by

   x=g(y)

  then wrt y the function is   F(g(y)) so the Hessian wrt y is

                    g_y' * F_xx * g_y

This is due to the fact that F_x=0.  Taking the inverse gives

                inv( g_y) * inv(F_xx) * inv(g_y')

so that when you do the delta method the g_y matrices cancel out.
 > If I write the
 > model in terms of the logarithms of the variances instead of the
 > variances I will definitely get a different Sigma
Yes but if you then use to delta method to construct the covariance 
matrix for the depdent variables
exp(log_sigmas) you will recover the original. This is just the chain rule.

 > but will the result
 > for a linear combination of mu and some of the u's stay invariant?  If
 > it isn't invariant, how do I choose the parameterization of the
 > variance components?
 >
Same as above. the whole thing is independent of how one parameterizes 
the x's.
 > Can you give a bit more detail on how you justify mixing derivatives
 > of the marginal log-likelihood (F) with derivatives of the conditional
 > density (L).  Do you know that these are on the same scale?  I'm
 > willing to believe that they are - it is just that I can't see right
 > off why they should be.
 >
 >
Think of it this way.  You can use the marginal likelihood Hessian F_xx 
to construct the estimated covariance matrix
for the independent variables x. Then via the delta  method you 
calculate the covariance matrix for the dependent variables
uhat(x). So far this is just pure frequentist stuff. Now if you know 
what that uhat actually are inv(L_uu) is an estimate for the
variance of the u's.  So a simple estimate of the overall variance of 
the u's is to add the two together. So
uhat_x' * S * uhat_x is the extra variance in the u's introduced by the 
uncertainty in the true value of the uhat
due to uncertainty in the true value of the x's.

-- 
David A. Fournier
P.O. Box 2040, Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com