[R-sig-ME] prediction for glmer objects

Sun Mar 1 13:16:11 CET 2009

Dear list-users,

in my Diploma-Theses I have the need for predicting new Y from a fitted object 
on a different dataset. Essentially what i want to do is:

	Y ~ X\beta +Zb
	and using the \beta and the variance Components of the random effects to get
	Ynew ~Xnew\beta + Znew rnorm(b,0,sd(b))

I hacked myself a bit through the (genious) code of lme4 and wanted to ask 
whether my code really does what I need. The outcoming vectors have quite 
reasonable probablitys. Binomial and poisson are the familys I use. 
I'm not interested in EBLUBS nor random slopes models. Just the fixed effects 
and the variances of the random (intercept) effects.

my code is:
prediction<-function(model,data,npred){
  require(lme4)
#  The function gl is an internal function of lme4
  gl <-function(formula, data, family = gaussian, start = NULL,
		  verbose = FALSE, nAGQ = 1, doFit = FALSE, subset, weights,
		  na.action, offset, contrasts = NULL, model = TRUE,
		  control = list(), ...){
		  mc <- match.call()
		  fr <- lme4:::lmerFrames(mc, formula, contrasts) # model frame, X, etc.
		  lme4:::lmerFactorList(formula, fr, 0L, 0L) # flist, Zt, dims
  }
  beta<-fixef(model)
  X <- model.matrix(terms(model), data=data)
  fix<-X%*%beta
  rm(beta,X)
  gc()
  gc()
 calllist<-
list(family=model at call$family,formula=model at call$formula,data=data,doFit=FALSE,verbose=T)
  tmp<-do.call(gl,calllist)
  Zt<-lme4:::mkZt(tmp,NULL)
  rm(tmp,calllist)
  gc()
  if(missing(npred)){
	raneff<-numeric()
	for(i in 1:(length(Zt$Gp)-1)){
		  raneff<-c(raneff,rnorm(Zt$Gp[i+1]-Zt$Gp[i],mean=0,sd=model at ST[[i]][1]))
	}
	ran<-crossprod(Zt$Zt,raneff)
	return(as.numeric(ran)+as.numeric(fix))
  }else{
	ranff<-matrix(NA,nrow=mult,ncol=length(data[,1]))
	for(j in 1:npred){
	  raneff<-numeric()
	  for(i in 1:(length(Zt$Gp)-1)){
		  raneff<-c(raneff,rnorm(Zt$Gp[i+1]-Zt$Gp[i],mean=0,sd=model at ST[[i]][1]))
	  }
	  ranff[j,]<-as.numeric(crossprod(Zt$Zt,raneff))+as.numeric(fix)
	}
	return(ranff)
  }
}

Any idea or advice would be appreciated. As I have a large dataset on which I 
want to make a prediction ( about 6 million with 9 covariates 3 Random effects 
with 16 , 550 an 38000 levels) any memory saving idea is also most important 
to me.

Many thanks in advance,
Jan Pablo Burgard