[R-sig-ME] A glmm prediction problem

Thu Jan 31 00:56:52 CET 2008

A couple clarifications:

- It is a little bit of a digression, but for the data I'm looking at,
there really are no fixed effects and the population mean probability of
success is just 0.5: any deviation from that belongs in a random effect.
I can't fit that model directly, so you're right, I do end up with an
estimate for a fixed intercept term, which is always very close to zero.
For many purposes it can be ignored, because it accounts for so little
variance compared to the fitted random effects.

(the data consists of experimental estimates of the proportion of each
allele at genetic markers where an individual is known to have an "AB"
genotype, so we know with certainty that the underlying ratio is 1:1.
I'm using random effects to model sources of variance in the experiment
that cause overdispersion in the observed ratios.)  

- Using your terminology, I want to make predictions for y values for a
previously unseen field (new value of 'a').  I will essentially always
have data for two seed pods (two values of 'c'), and may have several
values of 'b' available, but often just one.

I was thinking to get a likelihood for a new set of y/a/b/c values by
numerically integrating p(random effects)*p(y|random effects) over the
fitted multivariate normal distribution of random effects.  I could
estimate marginal distributions for the individual random effects at the
same time.  This seems manageable for the important case of one new
field, one plant, two pods (integrating over 4 random effects) but gets
out of hand when there are multiple plants.

Alternatively it seems that 'lmer' should be able to do the work for me
if I could prespecify a model.  I found a hint on how to lock down the
random effects from an archived posting on this list (via the start= and
control= parameters) but the intercept would be a problem.

-- Dave

-----Original Message-----
From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas
Bates
Sent: Wednesday, January 30, 2008 12:36 PM
To: David Hinds
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] A glmm prediction problem

On Jan 30, 2008 2:02 PM, David Hinds <David_Hinds at perlegen.com> wrote:
> I'm using lmer with a generalized linear (binomial) mixed model with
> nested random effects, like:

>         y ~ (1 | a / b / c)

> There are no fixed effects.

I would check that.  The model will include a constant term in the
fixed effects.

I don't think it is possible in the current formulation to fit a model
without any fixed effects.  There is nothing in the theory or
computational methods that would preclude that but I be hard pressed
to think of a situation where it would be sensible to do so.  The
distribution of the random effects assumes a mean of zero for all the
random effects terms.  If you don't have any fixed effects terms then
you are assuming that the population mean probability of success is
exactly 0.5, which seems like a pretty strong assumption.

> After fitting the model, I would like to make
> predictions for a new set of y values: specifically, I want to predict
> BLUPs
> for the random effects, and I would like to compute likelihoods for
sets
> of
> y values under the fitted model.

Are the new y values to be at some set of observed levels for the a, b
and c factors?  For example, suppose that factor a is the field,
factor b is the plant selected from the field, factor c is the seed
pod selected from the plant and the observational unit is the seed
within the seed pod.  Do you want to predict for another seed from
that particular seed pod or for a seed in another, as yet unobserved,
seed pod from that plant or for a seed from a seed pod from another,
as yet unobserved, plant from one of the fields you observed or ...

Essentially what will happen is that you will need to use the marginal
variance for units that are as yet unobserved and the conditional
variance for the units that have been observed when determining the
variance of the linear predictor.  Because you have a generalized
linear mixed model you need to convert the variance of the linear
predictor to an associated interval on the mean response through the
inverse link function.  Somewhere along the line what would be the
"residual variance" in a model for a Gaussian family would need to be
incorporated for the binomial family.  I'm not sure exactly how that
would be done.  Suffice it to say that formulating the prediction
interval would not be trivial.

> I don't see a completely straightforward way of doing this since it
> isn't
> the usual sort of prediction problem.  Is this even a sensible thing
to
> do?
>
> -- David Hinds
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>