# [R-sig-ME] prediction for glmer objects

burg4401 at uni-trier.de burg4401 at uni-trier.de
Sun Mar 1 13:16:11 CET 2009

Dear list-users,

in my Diploma-Theses I have the need for predicting new Y from a fitted object
on a different dataset. Essentially what i want to do is:

Y ~ X\beta +Zb
and using the \beta and the variance Components of the random effects to get
Ynew ~Xnew\beta + Znew rnorm(b,0,sd(b))

I hacked myself a bit through the (genious) code of lme4 and wanted to ask
whether my code really does what I need. The outcoming vectors have quite
reasonable probablitys. Binomial and poisson are the familys I use.
I'm not interested in EBLUBS nor random slopes models. Just the fixed effects
and the variances of the random (intercept) effects.

my code is:
prediction<-function(model,data,npred){
require(lme4)
#  The function gl is an internal function of lme4
gl <-function(formula, data, family = gaussian, start = NULL,
verbose = FALSE, nAGQ = 1, doFit = FALSE, subset, weights,
na.action, offset, contrasts = NULL, model = TRUE,
control = list(), ...){
mc <- match.call()
fr <- lme4:::lmerFrames(mc, formula, contrasts) # model frame, X, etc.
lme4:::lmerFactorList(formula, fr, 0L, 0L) # flist, Zt, dims
}
beta<-fixef(model)
X <- model.matrix(terms(model), data=data)
fix<-X%*%beta
rm(beta,X)
gc()
gc()
calllist<-
list(family=model at call$family,formula=model at call$formula,data=data,doFit=FALSE,verbose=T)
tmp<-do.call(gl,calllist)
Zt<-lme4:::mkZt(tmp,NULL)
rm(tmp,calllist)
gc()
if(missing(npred)){
raneff<-numeric()
for(i in 1:(length(Zt$Gp)-1)){ raneff<-c(raneff,rnorm(Zt$Gp[i+1]-Zt$Gp[i],mean=0,sd=model at ST[[i]])) } ran<-crossprod(Zt$Zt,raneff)
return(as.numeric(ran)+as.numeric(fix))
}else{
ranff<-matrix(NA,nrow=mult,ncol=length(data[,1]))
for(j in 1:npred){
raneff<-numeric()
for(i in 1:(length(Zt$Gp)-1)){ raneff<-c(raneff,rnorm(Zt$Gp[i+1]-Zt$Gp[i],mean=0,sd=model at ST[[i]])) } ranff[j,]<-as.numeric(crossprod(Zt$Zt,raneff))+as.numeric(fix)
}
return(ranff)
}
}

Any idea or advice would be appreciated. As I have a large dataset on which I
want to make a prediction ( about 6 million with 9 covariates 3 Random effects
with 16 , 550 an 38000 levels) any memory saving idea is also most important
to me.

Many thanks in advance,
Jan Pablo Burgard