[R] party package: ctree - survival data - extracting statistics/predictors
Sarah Bonnin
Sarah.Bonnin at crg.eu
Thu Aug 23 17:41:31 CEST 2012
Dear R users,
I am trying to apply the analysis processed in a paper, on the data I'm working with.
The data is: 80 patients for which I have survival data (time - days, and event - binary), and microarray expression data for 200 genes (predictor continuous variables).
My data matrix "data.test" has ncol: 202 and nrow: 80.
What I want to do is:
- run recursive partitioning on this data to get groups of patients homogenous in terms of survival/prognosis.
- extract the "correlation" of single gene expression (each of the 200 genes) with recurrence-free survival (time and event): i want to know which variables can predict best a poor/good prognosis based on survival data.
I am using function "ctree" from the "party" package.
I came up with this command:
test <- ctree(Surv(time, event)~.,
data =data.test,
controls=ctree_control(teststat="max", testtype="Bonferroni", mincriterion=0.95,savesplitstats = TRUE),
ytrafo = function(data)trafo(data, numeric_trafo = rank),
xtrafo=function(data)trafo(data, surv_trafo=logrank_trafo(data, ties.method = "logrank"))
)
which works well but as I am not a statistician it is quite confusing and i might not run it properly.
My technical problem is that I would like to extract the statistics output from my "test" object (BinaryTree class), i.e. P-value of each of the 200 comparisons (survival data versus each gene): i would like to know which of them can be really correlated to each node of the tree.
I tried:
test at tree$criterion$statistic
but the maximum value of this is 16, so I assume it is not a p-value as such: what is it?
and:
test at tree$criterion$criterion
maximum value is 0.96 and minimum value is 0; only one is > 0.95
str(test) gives quite some information, but it is more confusing than helping me at the moment.
I want to know:
- if my command for "ctree" makes sense to people who have more experience than me with this kind of data...
- which elements of "test" represent which statistics and how to interpret them: as I understood, setting "mincriterion" to 0.95 equals to setting up a P-value threshold of 0.05 (ctree help: "when 'mincriterion = 0.95', the p-value must be smaller than $0.05$ in order to split this node.")
I hope my explanation is clear, I might be completely mistaken: any tip or guidance are more than welcome...
Thanks!
Sarah
sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 grid splines stats graphics grDevices utils datasets methods
[10] base
other attached packages:
[1] biomaRt_2.10.0 party_1.0-2 vcd_1.2-13 colorspace_1.1-1 MASS_7.3-20
[6] strucchange_1.4-7 sandwich_2.2-9 zoo_1.7-7 coin_1.0-21 mvtnorm_0.9-9992
[11] modeltools_0.2-19 survival_2.36-14
loaded via a namespace (and not attached):
[1] lattice_0.20-6 RCurl_1.91-1.1 tools_2.14.2 XML_3.9-4.1
------------------
Sarah Bonnin
Bioinformatician
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
------------------
Sarah Bonnin
Bioinformatician
Genomics Unit - Office 439.01
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
Tel. +34 93-316-0373
www.crg.eu
More information about the R-help
mailing list