[BioC] Re : Cox Model

Thu Feb 14 04:46:19 CET 2008

Eleni,

Note that some of the genes that declared as significant in a univariate 
analysis could be highly correlated. Thus, some of the selected genes 
would not be informative in building the multivariate model.

You might want to consider reducing the dimensionality by first grouping 
the genes into clusters with similar patterns. There are many techniques 
but the one I can recall now is one of the earliest called gene shaving.

Or you can pre-select some genes based on variability measures etc.

Regards, Adai

Eleni Christodoulou wrote:
> Hi,
> 
> Thanks for the replies. I will probably try to perform survival analysis on
> each of the genes to get gene-wise p-values and then select the most
> significant (the ones that are below a certain p-value) and proceed to a
> full cox regression using the significant genes. Do you think that this
> makes sense?
> 
> Thanks a lot,
> Eleni
> 
> On Feb 13, 2008 2:11 PM, <phguardiol at aol.com> wrote:
> 
>>  Hi,
>> wouldnt it make sense to first have data reduction dimensionality before
>> undergoing such survival analysis ? Certainly, some of your genes have
>> similar expression profiles across samples...?
>>  Best,
>>  Philippe Guardiola
>>
>>
>>  -----E-mail d'origine-----
>> De : Ramon Diaz-Uriarte <rdiaz at cnio.es>
>> A : bioconductor at stat.math.ethz.ch
>> Cc : Eleni Christodoulou <elenichri at gmail.com>
>> Envoyé le : Me, 13 Février 2008 11:23
>> Sujet : Re: [BioC] Cox Model
>>
>>  Dear Eleni,
>>
>>
>> You are trying to fit a model with 18000 covariates but only 80 samples (of
>>
>> which, at most, only 80 are not censored). Just doing it the way you are
>>
>> trying to do it is unlikely to work or make much sense...
>>
>>
>> You might want to take a look at the work of Torsten Hothorn and colleagues on
>>
>> survival ensembles, with implementations in the R package mboost, and their
>>
>> work on random forests for survival data (see R package party). Some of this
>>
>> funcionality is also accessible through our web-based tool SignS
>>
>> (http://signs.bioinfo.cnio.es), which uses the above packages.
>>
>>
>> Depending on your exact question, you might also want to look at the approach
>>
>> of Jelle Goeman, for testing whether sets of genes (e.g., you complete 18000
>>
>> set of genes) are related to the outcome of interest (survival in your case).
>>
>> Goeman's approach is available in the globaltest package from BioC.
>>
>>
>> Hope this helps,
>>
>>
>> R.
>>
>>
>>
>> On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote:
>>
>>> Hello BioC-community,
>>> It's been a week now that I am struggling with the implementation of a cox
>>> model in R. I have 80 cancer patients, so 80 time measurements and 80
>>> relapse or no measurements (respective to censor, 1 if relapsed over the
>>> examined period, 0 if not). My microarray data contain around 18000 genes.
>>> So I have the expressions of 18000 genes in each of the 80 tumors (matrix
>>> 80*18000). I would like to build a cox model in order to retrieve the most
>>> significant genes (according to the p-value). The command that I am using
>>> is:
>>> test1 <- list(time,relapse,genes)
>>> coxph( Surv(time, relapse) ~ genes, test1)
>>> where time is a vector of size 80 containing the times, relapse is a vector
>>> of size 80 containing the relapse values and genes is a matrix 80*18000.
>>> When I give the coxph command I retrieve an error saying that cannot
>>> allocate vector of size 2.7Mb  (in Windows). I also tried linux and then I
>>> receive error that maximum memory is reached. I increase the memory by
>>> initializing R with the command:
>>> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M
>>> I think it cannot get better than that because if I try for example
>>> max-vsize=300 the memomry capacity is stored as NA.
>>> Does anyone have any idea why this happens and how I can overcome it?
>>> I would be really grateful if you could help!
>>> It has been bothering me a lot!
>>> Thank you all,
>>> Eleni
>>>   [[alternative HTML version deleted]]
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>>
>> Ramón Díaz-Uriarte
>>
>> Statistical Computing Team
>>
>> Centro Nacional de Investigaciones Oncológicas (CNIO)
>>
>> (Spanish National Cancer Center)
>>
>> Melchor Fernández Almagro, 3
>>
>> 28029 Madrid (Spain)
>>
>> Fax: +-34-91-224-6972
>>
>> Phone: +-34-91-224-6900
>>
>> http://ligarto.org/rdiaz
>>
>> PGP KeyID: 0xE89B3462
>>
>> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>>
>>
>>
>>
>> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y ...{{dropped:3}}
>>
>>
>> _______________________________________________
>>
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor