[R] val.surv
Frank Harrell
f.harrell at vanderbilt.edu
Sun Aug 21 19:01:15 CEST 2011
I welcome a trivial self-contained set of code, not using frailties, that
generates its own data and which fails. Then I'll debug.
Frank
David Winsemius wrote:
>
> On Aug 21, 2011, at 7:03 AM, Salvo Mac wrote:
>
>> I've attached a sample of the data sets, this is my full code.
>>
>> #R 2.13
>> #library(survival)
>> #library(Hmisc)
>> #library(splines)
>> library(rms)
>> train<-as.data.frame( train<-read.csv("G:\\train.txt", header=T,
>> sep="\t"))
>> test<-as.data.frame( test<-read.table("G:\\test.txt", header=T,
>> sep="\t"))
>> f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
>> val.surv(f.1, newdata=test, u=10)
>>
>>
>> #plot(calibrate(f.1, u=30, B=20))
>
> Salvo;
>
> I'm not sure where the error is coming from (although I will share my
> speculation below). I modified your code somewhat. I was under the
> impression that read.csv had sep="," hardcoded and the read.csv( ...
> seo="\t") would throw and error. The as.data.fame around either of the
> read.* statements is completely unnecessary:
>
> Input code:
> train<-read.table("~/train.txt", header=T, sep="\t")
> test<-read.table("~/test.txt", header=T, sep="\t")
>
> I thought the newdata=test argument should succeed, but it does throw
> the error that you report:
>
> f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
> val.surv(f.1, newdata=test, u=10)
> Error in val.surv(f.1, newdata = test, u = 10) :
> dims [product 210] do not match the length of object [314]
> In addition: Warning message:
> In est.surv + S[, 1] :
> longer object length is not a multiple of shorter object length
>
> I tried sampling from test with replacement to generate a dataframe of
> equal extent and with that object I do not get the same errors:
>
> > extend <- test[sample(1:nrow(test), 314, replace=TRUE), ]
> > f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train)
> > val.surv(f.1, newdata=extend, u=10)
>
> Validation of Predicted Survival at Time= 10 n= 314 , events= 64
>
> hare fit:
>
> dim A/D loglik AIC penalty
> min max
> 1 Add -485.24 976.22 271.33 Inf
> 2 Del -349.57 710.64 5.67 271.33
> 3 Del -346.74 710.72 0.96 5.67
> 4 Del -346.26 715.51 0.01 0.96
> 5 Add -346.25 721.25 0.00 0.01
>
> the present optimal number of dimensions is 2.
> penalty(AIC) was 5.75, the default (BIC), would have been 5.75.
>
> dim1 dim2 beta SE Wald
> Constant -10 0.55 -18.14
> Time 43 0.15 0.015 10.30
>
> Function used to transform predictions:
> function (p) log(-log(p))
>
> Mean absolute error in predicted probabilities: 0.0331
> 0.9 Quantile of absolute errors : 0.0613
>
> I have also tried looking at the code and adding
> options(error=utils::recover) to see if I can identify the point where
> the length mismatch is being generated, (but I am NOT an ace
> debugger). I can see that est.surv is created. I can also get the
> predict(fit, newdata, type = "lp") call to run with train and give
> sensible numbers. You did not create (or at least say so) a datadist.
> I tried that when I saw that "lim" was dependent on datadist limits.
>
> ddT <- datadist(train); options(datadist="ddT")
>
> Seeing the usehare was set to TRUE; I submitted the code section that
> was intended for that situation to a browser session and the the first
> line throws the error that I see at the console:
>
> Enter a frame number, or 0 to exit
>
> 1: val.surv(f.1, newdata = test, u = 10)
>
> Selection: 1
> Called from: top level
> Browse[1]> i <- !is.na(est.surv + S[, 1] + S[, 2])
> Error during wrapup: dims [product 210] do not match the length of
> object [314]
>
> As I read this that line is supposed to create an index of valid rows
> in newdata and the fitted values but is failing in the situation where
> the nrows of the newdata does not match that of the fit.
>
> At this point one option might be trying to generate a structure that
> matches the val.surv output by going through the code and building it
> up bit by bit:
>
> w <- structure(list(harefit = f, p = est.surv, actual = actual,
> pseq = pseq, actualseq = actualseq, u = u, fun = fun,
> n = nrow(S), d = sum(S[, 2]), units = units), class =
> "val.survh")
> return(w)
>
> But a much better option would be to report the error to Frank
> Harrell. I'm copying him since I think your .txt files probably
> reached the list as well as my mailbox.
>
> --
> David.
>
>>
>>
>>
>>
>>
>> ________________________________
>> From: David Winsemius <dwinsemius at comcast.net>
>> To: Salvo Mac <salvo_mac at yahoo.com>
>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>> Sent: Sunday, August 21, 2011 5:29 AM
>> Subject: Re: [R] val.surv
>>
>>
>> On Aug 20, 2011, at 10:25 PM, Salvo Mac wrote:
>>
>>> The test and train are like split data sets, contain similar
>>> variables but from different countries so the two sets are somehow
>>> independent. And yes it is a data frame.
>>
>> What is a data.frame? test and train may be dataframes, but test[,
>> "age"] is not a dataframe.
>>
>>> So I extracted age, time and event.
>>
>> Code? The code you offered before would have created a newdata
>> object (yes, a data.frame) with a single column bearing the same
>> name as the vector argument ,,,,, "test1". Not named "age". Try it.
>> do str on such an object:
>>
>> str(dataframe(test1))
>>
>>
>>> So test is data frame,(age, time, event). does that suffice?
>>
>> It certainly does not allow me to reproduce the error you got (which
>> I still think is probably related to the structure of your argument
>> to newdata.)
>>
>> That's all I can say without data and code.
>>
>> --David.
>>
>>
>>>
>>>
>>>
>>> ________________________________
>>> From: David Winsemius <dwinsemius at comcast.net>
>>>
>>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>>> Sent: Sunday, August 21, 2011 3:19 AM
>>> Subject: Re: [R] val.surv
>>>
>>>
>>> On Aug 20, 2011, at 8:08 PM, Salvo Mac wrote:
>>>
>>>> Thanks David
>>>>
>>>> However, I tried your trick on val.surv with newdata=test['age']
>>>> but still didn't work.
>>>> Still gives the same error message:
>>>>
>>>> Error in val.surv(f.1, newdata = test1["age"], u = 10) :
>>>> dims [product 1797] do not match the length of object [2496]
>>>> In addition: Warning message:
>>>> In est.surv + S[, 1] :
>>>> longer object length is not a multiple of shorter object length
>>>>
>>>
>>> As I said (and you did
>>> not act upon):
>>>
>>> The fundamental thing you are doing wrong for q1 is failing to
>>> unambiguously describe the test object.
>>>
>>> I said it was a guess. Now stop wasting our time and offer what is
>>> needed.
>>>
>>> --david.
>>>>
>>>> Salvo
>>>> ________________________________
>>>> From: David Winsemius <dwinsemius at comcast.net>
>>>>
>>>> Cc: "r-help at R-project.org" <r-help at r-project.org>
>>>> Sent: Sunday, August 21, 2011 12:55 AM
>>>> Subject: Re: [R] val.surv
>>>>
>>>>
>>>> On Aug 20, 2011, at 3:32 PM, Salvo Mac wrote:
>>>>
>>>>> Dear R-users,
>>>>>
>>>>> I have two questions regarding
>>> validation and calibration of Survival regression models.
>>>>>
>>>>> 1. I am trying to calibrate and validate a cox model using
>>>>> val.surv.
>>>>> here is my code:
>>>>> f.1<-cph(Surv(time,event)~age, x=T, y=T, data=train)
>>>>> test1<-test[,"age"]
>>>>> val.surv(f.1, newdata=data.frame(test1), u=10)
>>>>>
>>>>> but I get an error message:
>>>>>
>>>>> Error in val.surv(f.1, newdata = data.frame(testi), u = 10) :
>>>>> dims [product 1797] do not match the length of object [2496]
>>>>> In addition: Warning message:
>>>>> In est.surv + S[, 1] :
>>>>> longer object length is not a multiple of shorter object
>>>>> length
>>>>>
>>>>> I ran the example in the r-documentation but couldn't
>>>>> extract dxy from result.
>>>>>
>>>>> What am I doing wrong?
>>>>
>>>> The fundamental thing you are doing wrong for q1 is failing to
>>>> unambiguously describe the test object. I would think that if test
>>>> were a dataframe then wrapping data.frame around a vector might
>>>> not get it named correctly as 'age'. You might try newdata=
>>>> test['age']. Just a guess.
>>>>
>>>>>
>>>>> 2. In validate and calibrate cph functions. If it is frailty
>>>>> fit, does the the bootstrap resample clusters or just individuals
>>>>
>>>> The code above appears to be dependent on the rms package. The
>>>> frailty function is part of the underlying survival package and I
>>>> do not see it mentioned in the index for Harrell's RMS text. You
>>>> will probably need to wait until Frank comes across this. He is
>>>> generally very good about correction my errors and knowledge gaps.
>>>>
>>>>>
>>>>
>>
>> David Winsemius, MD
>> West Hartford,
>> CT<train.txt><test.txt>______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/val-surv-tp3757712p3758717.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list