[R-sig-ME] lme and prediction intervals

Tue Apr 6 23:00:08 CEST 2010

D Chaws wrote, Tue Apr 6 04:48:33 CEST 2010:

> As a professional statistician, what is the appropriate method to
> compute the SE of these predictions which accounts for the variance of
> the fixed and random effects?

I asked the same question myself about prediction last month.  I think
Doug's response to me was, in a nutshell, how can you statistically
*define* the variance contribution of the random effects?  (Doug,
please correct me if I am wrong or over-simplifying).  After thinking
about it for awhile, I indeed can *not* see what the variance of the
random effect means from a classical perspective.

Nobody seems to have conceptual difficulties with the errors from the
fixed effects, so I'll sketch out a toy example to show how I've
thought about it in terms of the theoretical statistics that I dimly
remember (again, corrections from anyone are most welcome).

Writing it out, a simple linear model is Y ~ N( A + BX, s^2).  Our
vector of parameters in the first distribution is (theta_1) = (A, B,
s).  From a classical perspective, A and B are fixed parameters, so
that when we estimate A, B, we are maximizing the likelihood L:

L ( theta_1 | X ) = P ( Y=y | theta_1 ) = 1/sqrt (2 pi sigma^2) exp -
(Y - A - BX )^2 / sigma^2

where (theta_1) is set by definite numbers, (theta_1) = (A, B, s).

Now, thinking about the random effects model, we're saying that A is a
random effect, so it's not just a set of numbers.  Therefore A is a
random variable, distributed by A ~ N(mu, sigma^2).  Our second vector
of parameters for this distribution is (theta_2) = (mu, sigma).  On
the face of it, this doesn't necessarily seem catastrophic, since we
just want the marginal probability distribution function for Y based
on the joint probability of Y, A,

P (Y=y | theta_1) = \int^\infty_infty P ( Y=y, A=a | theta_1, theta_2) da

However, this is where I get stuck in the classical perspective.  We
need to state *something* about the quantity P ( Y=y, A=a | theta_1,
theta_2) inside the integral, but it's not clear to me what we can say
about the relationship of A and Y, and presumably theta_2 is derived
from the observations of Y.  I think Doug's point is that if we have
the probability P ( Y=y, A=a | theta_1, theta_2), you can't just use
conditioning to break it down such as:

P ( Y=y, A=a | theta_1, theta_2) *not* equal to P ( Y=y | theta_1)
times P ( A=a | theta_2)

because somehow you got (or would like to get) theta_2 based on the Y
(data) as well.  So they A and Y are not independent.

In the Bayesian perspective, I guess you'd say that the posterior =
prior x likelihood, so that:

P ( theta_1, theta_2 | A, Y)  is proportional to P ( theta_1) times P
( theta_2 | A, Y, theta_1)

and then insert some belief about the the prior distribution of
theta_1, and do this iteratively for each of the parameter values
until you get convergence.

Anyway, this is how I've thought about it.  Corrections welcome.

David

--
David Hsu
PhD Candidate, Urban Planning
University of Washington
(e-mail) dhsu2 at uw.edu