[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

John Fox jfox at mcmaster.ca
Sat Jun 16 17:22:33 CEST 2012


Dear Jurgen,

> fit <- survfit(mod.allison.5, newdata.1, individual=TRUE)
> fit
Call: survfit(formula = mod.allison.5, newdata = newdata.1, individual = TRUE)

records   n.max n.start  events  median 0.95LCL 0.95UCL 
  19809     432     432     114      NA      NA      NA 

plot(fit)  # shows one survival curve

See ?survfit.coxph.

Best,
 John


On Sat, 16 Jun 2012 16:23:41 +0200
 Jürgen Biedermann <juergen.biedermann at googlemail.com> wrote:
> Dear John,
> 
> Thank you very much for the quick answer. I will use the mentioned citation of your book in the future.
> 
> However:
> The Id argument is necessary for the function to know that the different rows belong to one individual.
> 
> Or how the Help-File says:
> 
> id:
> optional variable name of subject identifiers. If this is present, then each group
> of rows with the same subject id represents the covariate path through time of
> a single subject, and the result will contain one curve per subject. If the coxph
> fit had strata then that must also be specified in newdata. If missing, then each
> individual row of newdata is presumed to represent a distinct subject and there
> will be nrow(newdata) times the number of strata curves in the result (one for
> each strata/subject combination).
> 
> I tried your proposal anyway, but, as expected, fifty different survival curves appeared.
> 
> So, there is still no solution.
> 
> Best regards
> Jürgen
> 
> -- 
> -----------------------------------
> Jürgen Biedermann
> Bergmannstraße 3
> 10961 Berlin-Kreuzberg
> Mobil: +49 176 247 54 354
> Home: +49 30 940 53 044
> e-mail: juergen.biedermann at gmail.com
> 
> 
> --------- Korrespondenz ----------
> 
> Betreff: Re: How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
> Von: John Fox <jfox at mcmaster.ca>
> An: J?rgen Biedermann <juergen.biedermann at googlemail.com>
> Datum: 16.06.2012 15:55
> > Dear Jürgen,
> >
> > All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want.
> >
> > By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available at<http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>.
> >
> > I hope this helps,
> >   John
> >
> > ------------------------------------------------
> > John Fox
> > Sen. William McMaster Prof. of Social Statistics
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > http://socserv.mcmaster.ca/jfox/
> > 	
> > On Sat, 16 Jun 2012 15:04:48 +0200
> >   Jürgen Biedermann<juergen.biedermann at googlemail.com>  wrote:
> >> Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...
> >>
> >> I don't find a solution to use the "survfit" function (package: survival)  for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous.
> >> I already googled the problem, found many persons having the same or a similar problem, but still no solution.
> >> I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient).
> >>
> >> For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data".
> >> http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
> >>
> >> Do you know any help? See code below
> >>
> >> Thanks very much in advance
> >> Jürgen Biedermann
> >>
> >> #----------------------------------------
> >> #Code
> >>
> >> Rossi<- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T)
> >>
> >> Rossi.2<- fold(Rossi, time='week',
> >>       event='arrest', cov=11:62, cov.names='employed')
> >>
> >> # see below for the fold function from John Fox
> >>
> >> # modeling an interaction with time (Page 14)
> >>
> >> mod.allison.5<- coxph(Surv(start, stop, arrest.time) ~
> >>       fin + age + age:stop + prio,
> >>       data=Rossi.2)
> >> mod.allison.5
> >>
> >> # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5
> >>
> >> newdata.1<- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
> >> fit<- survfit(mod.allison.5,newdata.1,id="Id")
> >>
> >> Error message:
> >>
> >>   >Fehler in model.frame.default(data = newdata.1, id = "Id", formula =
> >> Surv(start,  :
> >>     Variablenlängen sind unterschiedlich (gefunden für '(id)')
> >>
> >> -->  failure, length of variables are different.
> >>
> >> #-----------------------------------------------------------------
> >> fold<- function(data, time, event, cov,
> >>       cov.names=paste('covariate', '.', 1:ncovs, sep=""),
> >>       suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
> >>       vlag<- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
> >>       xlag<- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
> >>       all.cov<- unlist(cov)
> >>       if (!is.list(cov)) cov<- list(cov)
> >>       ncovs<- length(cov)
> >>       nrow<- nrow(data)
> >>       ncol<- ncol(data)
> >>       ncov<- length(cov[[1]])
> >>       nobs<- nrow*ncov
> >>       if (length(unique(c(sapply(cov, length), length(cov.times)-1)))>  1)
> >>           stop(paste(
> >>               "all elements of cov must be of the same length and \n",
> >>               "cov.times must have one more entry than each element of
> >> cov."))
> >>       var.names<- names(data)
> >>       subjects<- rownames(data)
> >>       omit.cols<- if (!common.times) c(all.cov, cov.times) else all.cov
> >>       keep.cols<- (1:ncol)[-omit.cols]
> >>       nkeep<- length(keep.cols)
> >>       if (is.numeric(event)) event<- var.names[event]
> >>       times<- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
> >>           else data[, cov.times]
> >>       new.data<- matrix(Inf, nobs, 3 + ncovs + nkeep)
> >>       rownames<- rep("", nobs)
> >>       colnames(new.data)<- c('start', 'stop', paste(event, suffix, sep=""),
> >>           var.names[-omit.cols], cov.names)
> >>       end.row<- 0
> >>       for (i in 1:nrow){
> >>           start.row<- end.row + 1
> >>           end.row<- end.row + ncov
> >>           start<- times[i, 1:ncov]
> >>           stop<- times[i, 2:(ncov+1)]
> >>           event.time<- ifelse (stop == data[i, time]&  data[i, event] ==
> >> 1, 1, 0)
> >>           keep<- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T)
> >>           select<- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs),
> >> 1, all)
> >>           rows<- start.row:end.row
> >>           cov.mat<- xlag(matrix(unlist(data[i, all.cov]),
> >> nrow=length(rows)), lag)
> >>           new.data[rows[select], ]<-
> >>               cbind(start, stop, event.time, keep, cov.mat)[select,]
> >>           rownames[rows]<- paste(subjects[i], '.', seq(along=rows), sep="")
> >>           }
> >>       row.names(new.data)<- rownames
> >>       as.data.frame(new.data[new.data[, 1] != Inf&
> >>           apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
> >>       }
> >> #-----------------------------------------------------------------
> >>



More information about the R-help mailing list