[R-sig-ME] Calculating upper and lower confidence limits on a population estimate derived from multiple point estimates

Thu Jul 31 02:59:35 CEST 2014

As I understand it, you want a confidence interval for the sum of all 150000 true predictions - is that right?
You can't get that by summing the confidence limits. You should sum the rows of your mm matrix FIRST:

	summ <- apply(mm, 2, sum)

Then compute  

	pred  <-  sum(summ * fixef(glmmadmb_object))
	predvar  <-  sum(summ * (vcov(glmmadmb_object) %*% summ)) 

(This is just what you did before, specialized to the 1-row case. The result is a single number (aka a 1x1 matrix). Then your CI is 

	pred + c(LCL=-1, est=0, UCL=1) * 1.96 * sqrt{predvar)

There's a potential issue with the fact that the vcov() result is a 150000 x 150000 matrix. But I suppose it's rather sparse and stored that way?

Russ

Russell V. Lenth  -  Professor Emeritus
Department of Statistics and Actuarial Science   
The University of Iowa  -  Iowa City, IA 52242  USA   
Voice (319)335-0712 (Dept. office)  -  FAX (319)335-3017

Thank you for your response everyone. I should provide more information on
the method I used.

1.) I fit the model using the glmmadmb package with a negative binomial
distribution.

2.) I created a model matrix (mm) of fixed effects with new data
(n=~150,000), and multiplied this matrix with the vcov(glmmadmb_model)
matrix, according to some excellent documentation and code examples
developed online (r-sig-wikidot-FAQ-GLMM, etc.):

predvar<-diag(mm%*%vcov(glmmadmb_object)%*%t(mm))

3.) Then I took the square root of predvar to get the SE of each prediction
point:

newdata$SE<-sqrt(predvar)

4.) I made predictions with the fixed effects of the glmmadmb model, and
transformed that response with the link (exp(prediction)).

5.) To compute the upper and lower ranges of population estimates at each
point, I used the following:

(exp transformed prediction) +/- 1.96(SE)

The result is three columns of ~150,000 observations in my new dataset:
upper CI, lower CI, and transformed prediction.

As my question states I was hoping to simply sum the columns for a
reach-wide, minimum and maximum population estimate. However, I understand
the caveats of using glmms in this way for predictions. Have I missed
something? I'm hoping to use this method for a manuscript intended for
publication, so it needs to be sound, if that is even possible with all
things considered.