[R-sig-ME] lme and prediction intervals
David Hsu
dhsu2 at uw.edu
Tue Apr 6 23:00:08 CEST 2010
D Chaws wrote, Tue Apr 6 04:48:33 CEST 2010:
> As a professional statistician, what is the appropriate method to
> compute the SE of these predictions which accounts for the variance of
> the fixed and random effects?
I asked the same question myself about prediction last month. I think
Doug's response to me was, in a nutshell, how can you statistically
*define* the variance contribution of the random effects? (Doug,
please correct me if I am wrong or over-simplifying). After thinking
about it for awhile, I indeed can *not* see what the variance of the
random effect means from a classical perspective.
Nobody seems to have conceptual difficulties with the errors from the
fixed effects, so I'll sketch out a toy example to show how I've
thought about it in terms of the theoretical statistics that I dimly
remember (again, corrections from anyone are most welcome).
Writing it out, a simple linear model is Y ~ N( A + BX, s^2). Our
vector of parameters in the first distribution is (theta_1) = (A, B,
s). From a classical perspective, A and B are fixed parameters, so
that when we estimate A, B, we are maximizing the likelihood L:
L ( theta_1 | X ) = P ( Y=y | theta_1 ) = 1/sqrt (2 pi sigma^2) exp -
(Y - A - BX )^2 / sigma^2
where (theta_1) is set by definite numbers, (theta_1) = (A, B, s).
Now, thinking about the random effects model, we're saying that A is a
random effect, so it's not just a set of numbers. Therefore A is a
random variable, distributed by A ~ N(mu, sigma^2). Our second vector
of parameters for this distribution is (theta_2) = (mu, sigma). On
the face of it, this doesn't necessarily seem catastrophic, since we
just want the marginal probability distribution function for Y based
on the joint probability of Y, A,
P (Y=y | theta_1) = \int^\infty_infty P ( Y=y, A=a | theta_1, theta_2) da
However, this is where I get stuck in the classical perspective. We
need to state *something* about the quantity P ( Y=y, A=a | theta_1,
theta_2) inside the integral, but it's not clear to me what we can say
about the relationship of A and Y, and presumably theta_2 is derived
from the observations of Y. I think Doug's point is that if we have
the probability P ( Y=y, A=a | theta_1, theta_2), you can't just use
conditioning to break it down such as:
P ( Y=y, A=a | theta_1, theta_2) *not* equal to P ( Y=y | theta_1)
times P ( A=a | theta_2)
because somehow you got (or would like to get) theta_2 based on the Y
(data) as well. So they A and Y are not independent.
In the Bayesian perspective, I guess you'd say that the posterior =
prior x likelihood, so that:
P ( theta_1, theta_2 | A, Y) is proportional to P ( theta_1) times P
( theta_2 | A, Y, theta_1)
and then insert some belief about the the prior distribution of
theta_1, and do this iteratively for each of the parameter values
until you get convergence.
Anyway, this is how I've thought about it. Corrections welcome.
David
--
David Hsu
PhD Candidate, Urban Planning
University of Washington
(e-mail) dhsu2 at uw.edu
More information about the R-sig-mixed-models
mailing list