[R-sig-ME] lme and prediction intervals
dhsu2 at uw.edu
Tue Apr 6 23:00:08 CEST 2010
D Chaws wrote, Tue Apr 6 04:48:33 CEST 2010:
> As a professional statistician, what is the appropriate method to
> compute the SE of these predictions which accounts for the variance of
> the fixed and random effects?
I asked the same question myself about prediction last month. I think
Doug's response to me was, in a nutshell, how can you statistically
*define* the variance contribution of the random effects? (Doug,
please correct me if I am wrong or over-simplifying). After thinking
about it for awhile, I indeed can *not* see what the variance of the
random effect means from a classical perspective.
Nobody seems to have conceptual difficulties with the errors from the
fixed effects, so I'll sketch out a toy example to show how I've
thought about it in terms of the theoretical statistics that I dimly
remember (again, corrections from anyone are most welcome).
Writing it out, a simple linear model is Y ~ N( A + BX, s^2). Our
vector of parameters in the first distribution is (theta_1) = (A, B,
s). From a classical perspective, A and B are fixed parameters, so
that when we estimate A, B, we are maximizing the likelihood L:
L ( theta_1 | X ) = P ( Y=y | theta_1 ) = 1/sqrt (2 pi sigma^2) exp -
(Y - A - BX )^2 / sigma^2
where (theta_1) is set by definite numbers, (theta_1) = (A, B, s).
Now, thinking about the random effects model, we're saying that A is a
random effect, so it's not just a set of numbers. Therefore A is a
random variable, distributed by A ~ N(mu, sigma^2). Our second vector
of parameters for this distribution is (theta_2) = (mu, sigma). On
the face of it, this doesn't necessarily seem catastrophic, since we
just want the marginal probability distribution function for Y based
on the joint probability of Y, A,
P (Y=y | theta_1) = \int^\infty_infty P ( Y=y, A=a | theta_1, theta_2) da
However, this is where I get stuck in the classical perspective. We
need to state *something* about the quantity P ( Y=y, A=a | theta_1,
theta_2) inside the integral, but it's not clear to me what we can say
about the relationship of A and Y, and presumably theta_2 is derived
from the observations of Y. I think Doug's point is that if we have
the probability P ( Y=y, A=a | theta_1, theta_2), you can't just use
conditioning to break it down such as:
P ( Y=y, A=a | theta_1, theta_2) *not* equal to P ( Y=y | theta_1)
times P ( A=a | theta_2)
because somehow you got (or would like to get) theta_2 based on the Y
(data) as well. So they A and Y are not independent.
In the Bayesian perspective, I guess you'd say that the posterior =
prior x likelihood, so that:
P ( theta_1, theta_2 | A, Y) is proportional to P ( theta_1) times P
( theta_2 | A, Y, theta_1)
and then insert some belief about the the prior distribution of
theta_1, and do this iteratively for each of the parameter values
until you get convergence.
Anyway, this is how I've thought about it. Corrections welcome.
PhD Candidate, Urban Planning
University of Washington
(e-mail) dhsu2 at uw.edu
More information about the R-sig-mixed-models