[R] How to interpret these results from a simple gamma-frailty model

Tue Sep 19 14:04:27 CEST 2006

Dear R users,

I'm trying to fit a gamma-frailty model on a simulated dataset, with 6 covariates, and I'm running into some results I do not understand. I constructed an example from my simulation code, where I fit a coxph model without frailty (M1) and with frailty (M2) on a  number of data samples with a varying degree of heterogeneity (I'm running R 2.3.1, running takes ~1 min).

library(survival); set.seed(10000)
lambda <- 0.01			# Exp. hazard rate
# Beta coefficients for Age,TC,HDLC,SBP,Diab,Smok
beta <- c(0.0483,0.0064,-0.0270,0.0037,0.4284,0.5234)   
n <- 1000; Ngrp <- 2;		# Nr patients, Nr frailty groups	
# Thetas for gamma-frailty  
thetaset <- c(1,2,10,100); Ntheta <- length(thetaset);	

# Define the simulated population
age <-rnorm(n,48.6,11.7);tc<-rnorm(n,200,30)
hdlc<-rnorm(n,47,6);sbp<-rnorm(n,135,6) 
rtmp <- runif(0,1,n=n); diab <- rep(0, n); diab[rtmp < 0.05223] <- 1;
rtmp <- runif(0,1,n=n); smok <- rep(0, n); smok[rtmp < 0.40458] <- 1;
# Storage for the estimated coefficients in both models
cf <- 0; length(cf) <- 2*Ntheta*6; dim(cf) <- c(Ntheta,2,6); 

for (i in 1:Ntheta)
{
	l = thetaset[i]
	v=rep(rgamma(n/2,shape=l,scale=1/l),2)		# Shared frailty values
	print(paste("Theta = ",l, " variance in introduced frailty v   = ",var(v)))
	id=rep(seq(1:(n/2)),2); 			# Frailty id's, 
	c <- rep(1,n); 			 		# censoring flags
	r <- runif(n,0,1);				# For random variate generation
	t <- (-1*log(r)) / (lambda*v*exp(age*beta[1]+tc*beta[2]+hdlc*beta[3]+sbp*beta[4]+diab*beta[5]+smok*beta[6])) 
	fitM1=coxph(Surv(t,c)~age+tc+hdlc+sbp+diab+smok); 	# Model 1, no frailty
	cf[i,1,] <- round(fitM1$coef,6)				# Model 1 coefficients
	fitM2=coxph(Surv(t,c)~age+tc+hdlc+sbp+diab+smok+
		frailty(id,dist="gamma",sparse=T,method="em"))  # Model 2, frailty
	cf[i,2,] <- round(fitM2$coef,6)				# Model 2, coefficients

	# store estimated Variance of random effect
	varre=fitM2$history$frailty$history[length(fitM2$history$frailty$history[,1])]  		
	print(paste("Theta = ",l, " variance in estimated random effect= ",varre))
	print(paste("Theta = ",l, " variance in estimated ind. frailty = ", var(exp(fitM2$frail))))
	print(paste("Theta=",l," org beta ",beta," M1: ",cf[i,1,], " M2: ", cf[i,2,]))
	# Calculate the 'actual' 10-year risk and the risk estimated using M1 and M2
	estv <- exp(rep(fitM2$frail,2))		# Individual frailty values from M2
	# Original risk score
	rs<- age*beta[1]+tc*beta[2]+hdlc*beta[3]+sbp*beta[4]+diab*beta[5]+smok*beta[6] 		
	pevent <- exp(-10*lambda)^(exp(rs))		  	# Original 10-year risk
	rsM1<- (cf[i,1,1]*age+cf[i,1,2]*tc+cf[i,1,3]*hdlc+cf[i,1,4]*sbp+cf[i,1,5]*diab+cf[i,1,6]*smok)
	peventM1 <- exp(-10*lambda)^(exp(rsM1))
	rsM2<- (cf[i,1,1]*age+cf[i,1,2]*tc+cf[i,1,3]*hdlc+cf[i,1,4]*sbp+cf[i,1,5]*diab+cf[i,1,6]*smok)
	peventM2 <- exp(-10*lambda)^(estv*exp(rsM1))
	# Proportion of more accurate predictions
	pred <- sum(abs(pevent-peventM2) < abs(pevent-peventM1))/n	
	print(paste("Theta = ",l," proportion pred. M2 more accurate than M1 = ",pred)); 
}

corresponding output:

[1] "Theta =  1  variance in introduced frailty v   =  0.948105787326522"
[1] "Theta =  1  variance in estimated random effect=  1.04435029128552"
[1] "Theta =  1  variance in estimated ind. frailty =  0.520078219281493"
[1] "Theta= 1  org beta  0.0483  M1:  0.025015  M2:  0.045987"  
[2] "Theta= 1  org beta  0.0064  M1:  0.002274  M2:  0.005178"  
[3] "Theta= 1  org beta  -0.027  M1:  -0.014239  M2:  -0.028824"
[4] "Theta= 1  org beta  0.0037  M1:  0.00071  M2:  0.010826"   
[5] "Theta= 1  org beta  0.4284  M1:  0.421602  M2:  0.648227"  
[6] "Theta= 1  org beta  0.5234  M1:  0.360069  M2:  0.593551"  
[1] "Theta =  1  proportion pred. M2 more accurate than M1 =  0.448"
[1] "Theta =  2  variance in introduced frailty v   =  0.515806773271924"
[1] "Theta =  2  variance in estimated random effect=  0.644954876113229"
[1] "Theta =  2  variance in estimated ind. frailty =  0.277209957022078"
[1] "Theta= 2  org beta  0.0483  M1:  0.033213  M2:  0.046216"  
[2] "Theta= 2  org beta  0.0064  M1:  0.005633  M2:  0.009491"  
[3] "Theta= 2  org beta  -0.027  M1:  -0.010048  M2:  -0.016658"
[4] "Theta= 2  org beta  0.0037  M1:  -0.006032  M2:  -0.003639"
[5] "Theta= 2  org beta  0.4284  M1:  0.402992  M2:  0.403226"  
[6] "Theta= 2  org beta  0.5234  M1:  0.457665  M2:  0.691871"  
[1] "Theta =  2  proportion pred. M2 more accurate than M1 =  0.482"
[1] "Theta =  10  variance in introduced frailty v   =  0.104197974543906"
[1] "Theta =  10  variance in estimated random effect=  0.00331174766884151"
[1] "Theta =  10  variance in estimated ind. frailty =  2.30691702541282e-05"
[1] "Theta= 10  org beta  0.0483  M1:  0.048761  M2:  0.048885" 
[2] "Theta= 10  org beta  0.0064  M1:  0.002817  M2:  0.002833" 
[3] "Theta= 10  org beta  -0.027  M1:  -0.024016  M2:  -0.02411"
[4] "Theta= 10  org beta  0.0037  M1:  0.002227  M2:  0.002206" 
[5] "Theta= 10  org beta  0.4284  M1:  0.36471  M2:  0.36624"   
[6] "Theta= 10  org beta  0.5234  M1:  0.531466  M2:  0.532811" 
[1] "Theta =  10  proportion pred. M2 more accurate than M1 =  0.604"
[1] "Theta =  100  variance in introduced frailty v   =  0.0101553285954112"
[1] "Theta =  100  variance in estimated random effect=  0.0245876188637748"
[1] "Theta =  100  variance in estimated ind. frailty =  0.00113624325597910"
[1] "Theta= 100  org beta  0.0483  M1:  0.051439  M2:  0.052775"  
[2] "Theta= 100  org beta  0.0064  M1:  0.002654  M2:  0.002834"  
[3] "Theta= 100  org beta  -0.027  M1:  -0.025281  M2:  -0.025606"
[4] "Theta= 100  org beta  0.0037  M1:  0.005468  M2:  0.005903"  
[5] "Theta= 100  org beta  0.4284  M1:  0.467735  M2:  0.48221"   
[6] "Theta= 100  org beta  0.5234  M1:  0.673852  M2:  0.683147"  
[1] "Theta =  100  proportion pred. M2 more accurate than M1 =  0.564"
Warning message:
Inner loop failed to coverge for iterations 2 3 in: coxpenal.fit(X, Y, strats, offset, init = init, control, weights = weights,  
> 

I don't understand the following:
[1] The variance of the estimated individual frailty values is always much lower than both the original variance and the estimated variance of the random effect. Why is this ? Obviously the variance of the random effect is not calculated directly from the individual frailties after exponentiating them.

[2] Random  effects that are not very large are not even picked up by M2, e.g. theta = 10, yields a variance of ~1/10 in the frailty distribution used to set up the data, however, the estimated variance of that random effect equals only 0.003311. Why is frailty not picked up in this case ? Is it just that the variance of the heterogeneity inserted into the data is too small ?

[3] When I compare the predictive abilities of M1 and M2, by calculating the differences, e.g. between predicted 10-year risks, and determine the proportion of more accurate predictions over the complete sample, when using M2 instead of M1, M2 does not perform relevantly better than M1. For theta = 1 or 2 (substantial heterogeneity) M2 performs even worst than M1, while it should be able to take this heterogeneity into account and model M1 cannot do that. Can anyone explain why using M2 does not lead to better predictions in this situation ? 

[4] Intuitively it seems to me that in absence of heterogeneity (or with very little heterogeneity, e.g. theta=100) both models should be able to estimate the regression coefficients for the 6 covariates accurately, given enough events. With n=1000, the estimated coefficients are still very inaccurate (see e.g. theta=100, beta 2, 4 and 6) even though we have over 150 events/parameter. Even with lots of events (e.g. 50000) the estimates get more accurately but still are way from perfect. Is there an underlying reason for this slow convergence ? 

Is there anyone who can clarify some of these results ? Many thanks in advance for your help.
Erik.

=====================================================
Erik Koffijberg, MSc
Julius Center for Health Sciences and Primary Care
University Medical Center Utrecht Str. 6.131
P.O.Box 85500, 3508 GA Utrecht, The Netherlands
Tel: 	+31 30 250 3013,
Fax:    +31 30 250 5480
E-mail: H.Koffijberg at umcutrecht.nl