[R-sig-ME] Which residuals to use for GEEs?

Mon Oct 15 02:53:59 CEST 2012

Apologies for what may be a naive question, but I'd appreciate some help with model validation diagnostics for generalized estimating equations. I've searched extensively for the answer to the following, with no luck.

** Question: which residuals should be used when checking model adequacy with GEEs? **

I understand from Hardin & Hilbe 2002 ("Generalized Estimating Equations", CRC press) that plots of raw residuals against observation number can be used to identify time dependence in the residuals. However, I'm confused by the use of _raw_ residuals here. Do they take into account the correlation structure?

In contrast, when fitting a (generalized) linear model with a correlation structure (e.g. with gls/glmmPQL), I understand that standardized residuals (resid(fit, type='normalized')) should be used to look for time dependence (e.g. with an ACF), since they account for the correlation structure used (see e.g. Zuur et al.'s 2009 book, chapter 6). 

So why use one type of residuals for assessing residual indepence in GEEs, and another for (G)LMMs? The only reference I can find to using residuals standardized by the working correlation in a GEE is here: http://www.ufz.de/export/data/1/22564_carl_kuehn_ECOMOD1458.pdf. All other references I've found use the raw residuals.

My data have an auto-regressive temporal dependence, so the Pearson/response residuals show temporal patterns, whether I use a GEE (AR1) or glmmPQL (AR1). However, the standardized residuals (accounting for the correlation structure), show no such patterns for the glmmPQL model. But there's no easy way of calculating the standardized residuals that account for the working correlation in the GEE, and, from my interpretation of the references I've found, I should be using the raw residuals anyway (which I think don't incorporate the correlation structure).

The following code demonstrates the problem I'm having. It creates GEE and glmmPQL models from the dietox dataset, each with an AR1 correlation structure. Patterns are evident in the Pearson and response (raw) residuals for both models. But when plotting the standardized (normalized) residuals for the glmmPQL model, the patterns are resolved. Why can't I plot standardized residuals for the GEE too?

I must be missing something; any clues would be much appreciated!

Thanks in advance for any help, and apologies if this is a stupid question.

Andrew

---------------------------------------------------------------

require(geeglm)
require(ggplot2)
require(gridExtra)
require(MASS)

data(dietox)
dietox$Cu     <- as.factor(dietox$Cu)
dietox$nobs <- row(dietox)[,1]

# GEE model
mf <- formula(Weight~Cu*(Time+I(Time^2)+I(Time^3)))
gee1 <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"),corstr="ar1")

# Plot GEE residuals (Pearson & response)
rp<-resid(gee1,type="pearson")
rr<-resid(gee1,type="response")
dietox2 <- cbind(dietox, rp, rr)
pigs<-data.frame(start=which(!duplicated(dietox$Pig)),  group=dietox$Pig[which(!duplicated(dietox$Pig))])
theme_set(theme_bw())
g1p<-ggplot(dietox2, aes(x=row(dietox)[,1])) + geom_point(aes(y=rp)) + geom_linerange(data=pigs, aes(x=start, ymin=min(rp), ymax=max(rp)), linetype="dotted", colour="red", show_guide=F) + geom_hline(yintercept=0, slope=0) + labs(list(title="Pearson residuals", x="Obs no"))
g1r<-ggplot(dietox2, aes(x=row(dietox)[,1])) + geom_point(aes(y=rr)) + geom_linerange(data=pigs, aes(x=start, ymin=min(rr), ymax=max(rr)), linetype="dotted", colour="red", show_guide=F) + geom_hline(yintercept=0, slope=0) + labs(list(title="Response residuals", x="Obs no"))
grid.arrange(g1p, g1r, nrow=2)

# glmmPQL model
pql <- glmmPQL(fixed=mf, random=~1|Pig, data=dietox, family=poisson, correlation = corAR1(form = ~Time))

# Plot glmmPQL residuals (Pearson, response and standardized)
rp<-resid(pql,type="pearson")
rr<-resid(pql,type="response")
rn<-resid(pql, type="normalized")
dietox3 <- cbind(dietox, rp, rr, rn)
gp<-ggplot(dietox3, aes(x=nobs)) + geom_point(aes(y=rp)) + geom_linerange(data=pigs, aes(x=start, ymin=min(rp), ymax=max(rp)), linetype="dotted", colour="red", show_guide=F) + geom_hline(yintercept=0, slope=0) + labs(list(title="Pearson residuals", x="Obs no"))
gr<-ggplot(dietox3, aes(x=nobs)) + geom_point(aes(y=rr)) + geom_linerange(data=pigs, aes(x=start, ymin=min(rr), ymax=max(rr)), linetype="dotted", colour="red", show_guide=F) + geom_hline(yintercept=0, slope=0) + labs(list(title="Response residuals", x="Obs no"))
gn<-ggplot(dietox3, aes(x=nobs)) + geom_point(aes(y=rn)) + geom_linerange(data=pigs, aes(x=start, ymin=min(rn), ymax=max(rn)), linetype="dotted", colour="red", show_guide=F) + geom_hline(yintercept=0, slope=0) + labs(list(title="Normalized residuals", x="Obs no"))
grid.arrange(gp,gr,gn,nrow=3)