[R] Time-dependent coefficients in a Cox model with categorical variants
Therneau, Terry M., Ph.D.
therneau at mayo.edu
Thu Jan 18 20:38:17 CET 2018
First, as others have said please obey the mailing list rules and turn of
First, as others have said please obey the mailing list rules and turn off html, not everyone uses an html email client.
Here is your code, formatted and with line numbers added. I also fixed one error: "y" should be "status".
1. fit0 <- coxph(Surv(futime, status) ~ x1 + x2 + x3, data = data0)
2. p <- log(predict(fit0, newdata = data1, type = "expected"))
3. lp <- predict(fit0, newdata = data1, type = "lp")
4. logbase <- p - lp
5. fit1 <- glm(status ~ offset(p), family = poisson, data = data1)
6. fit2 <- glm(status~ lp + offset(logbase), family = poisson, data = data1)
7. group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf))
8. fit3 <- glm(status ~ -1 + group + offset(p), family = poisson, data = data1)
The key idea of the paper you referenced is that the counterpart to the Hosmer-Lemishow test (wrong if used directly in a Cox model) is to look at the predicted values from a Cox model as input to a Poisson regression. That means adding the expected from the Cox model as a fixed term in the Poisson. And like any other poisson that means offset(log(expected)) as a term.
The presence of time dependent covariates does nothing to change this, per se, since expected for time fixed is the same as for time varying. In practice it does matter, at least philosophically. Lines 1, 2, 5 do this just fine.
If data1 is not the same as data0, a new study say, then the test for intercept=0 from fit1 is a test of overall calibration. Models like line 8 try to partition out where any differences actually lie.
The time-dependent covariates part lies in the fact that a single subject may be represented by multiple lines in data0 and/or data1. Do you want to collapse that person into a single row before the glm fits? If subject "Jones" is represented by 15 lines in the data and "Smith" by 2, it does seem a bit unfair to give Jones 15 observations in the glm fit. But full discussion of this is as much philosophy as statistics, and is perhaps best done over a beer.
Terry T.
________________________________
From: Max Shell [archerrish at gmail.com]
Sent: Wednesday, January 17, 2018 10:25 AM
To: Therneau, Terry M., Ph.D.
Subject: Re: Time-dependent coefficients in a Cox model with categorical variants
Assessing calibration of Cox model with time-dependent coefficients<https://stats.stackexchange.com/questions/323569/assessing-calibration-of-cox-model-with-time-dependent-coefficients>
I am trying to find methods for testing and visualizing calibration to Cox models with time-depended coefficients. I have read your nice article<http://journals.sagepub.com/doi/10.1177/0962280213497434>. In this paper, we can fit three models:
fit0 <- coxph(Surv(futime, status) ~ x1 + x2 + x3, data = data0) p <- log(predict(fit0, newdata = data1, type = "expected")) lp <- predict(fit0, newdata = data1, type = "lp") logbase <- p - lp fit1 <- glm(y ~ offset(p), family = poisson, data = data1) fit2 <- glm(y ~ lp + offset(logbase), family = poisson, data = data1) group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf)) fit3 <- glm(y ~ -1 + group + offset(p), family = poisson, data = data1)
Here，I simplely use data1 <- data0[1:500,]
First, I get following error when running line 5.
Error in eval(predvars, data, env) : object 'y' not found
So I modifited the code by replacing the y as status looks like this:
fit1 <- glm(status ~ offset(p), family = poisson, data = data1) fit2 <- glm(status ~ lp + offset(logbase), family = poisson, data = data1) group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf)) fit3 <- glm(status ~ -1 + group + offset(p), family = poisson, data = data1)
Is this replacing correct?
Second, I try to introduce the time-transform use coxph with ttparament.
My code is: fit0 <- coxph(Surv(time, status) ~ x1 + x2 + x3 + tt(x3), data = data0, function(x, t, ...) x * t) p <- log(predict(fit0, newdata = data1, type = "expected")) lp <- predict(fit0, newdata = data1, type = "lp") logbase <- p - lp fit1 <- glm(status ~ offset(p), family = poisson, data = data1) fit2 <- glm(status ~ lp + offset(logbase), family = poisson, data = data1) group <- cut(lp, c(-Inf, quantile(lp, (1:9) / 10), Inf)) fit3 <- glm(status ~ -1 + group + offset(p), family = poisson, data = data1)
My questions is:
* Is the code above correct?
* How to interpret the fit1, fit2, fit3? What's the connection between the three models and the calibration of the Cox model?
* How to generate the calibration plot using fit3? The article dose have a section discuss this, but no code is provided.
Thank you!
On Mon, Jan 15, 2018 at 9:23 PM, Therneau, Terry M., Ph.D. <therneau at mayo.edu<mailto:therneau at mayo.edu>> wrote:
The model formula " ~ Histology" knows how to change your 3 level categorical variable into two 0/1 dummy variables for a regression matrix. The tt() call is a simple function, however, and ordinary multiplication and does not have those powers. In this case you need to do the setup by hand : create your own 0/1 dummy variables and work with them.
sqcc <- ifelse(dta$Histology == 'Sqcc', 0, 1)
hrac <- ifelse(dta$Histology == 'High risk AC', 0, 1)
fit <- coxph(Surv(time, status) ~ Sex + sqcc + hrac + tt(sqcc) + tt(hrac),
data = dta, tt = list(function(x,t, ...) x*log(t),
function(x, t, ...) x* log(t)))
Terry Therneau
PS I've rarely found x*log(t) to be useful, but perhaps you have already looked at the cox.zph plots and see that shape.
Suppose I have a dataset contain three variants, looks like
head(dta)
Sex tumorsize Histology time status
0 1.5 2 12.1000 0
1 1.8 1 38.4000 0
.....................
Sex: 1 for male; 0 for female., two levels
Histology: 1 for SqCC; 2 for High risk AC; 3 for low risk AC, three levels
Now I need to get a Time-dependent coefficients cox fit:
library(survival)
for(i in c(1,3) dta[,i] <- factor(dta[,i])
fit <-
coxph(
Surv(time, status) ~ Sex + tumorsize + Histology + tt(Histology),
data = dta,
tt = function(x, t, ...) x * log(t)
)
But I keep gettting this error says:
Error in if (any(infs)) warning(paste("Loglik converged before variable ", :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(x, log(t)) : ?*? not meaningful for factors.
How can I fix it? I know that the "Sex" and "Histology" are both
categorical variants. I want to have a model that have two ?(t) = a +
blog(t) for each histology level.
Thank you?
[[alternative HTML version deleted]]
More information about the R-help
mailing list