[BioC] fix

Chad Shaw cashaw at bcm.tmc.edu
Thu Dec 18 19:10:04 MET 2003


Stephen:

> I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST
> multiVARIATE methods are BUILt on top OF the marginal tests. FOR instance
> machine learning methods are based on gene subsets for each of k CROSS
> validations.

Right. I recognize that gene selection is a central component of many 
sequential data analysis
schemes-- "at stage 1" pick a set of genes which (by a selection scheme) 
show regulation in the
array experiments -- then at stage 2 you do something with that.

My comment is STILL that this is a bad approach.  I'm guilty of it, too.
We are focusing on the trees instead of the ecosystem -- and if we had 
better covariate
info/ knowledge of gene-connectedness we wouldnt be doing this.

Moreover, if what you are doing at stage 2-k is based on 'binning' of 
genes,
then a low frequency false positives at stage 1 will matter less, and so 
will slightly sub-optimal
single gene power.

> USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for subset
> selection is IMHO the most IMPORTANT!! choice .
>  
>
Yes I agree.  Its just that THE FIXATION on this topic to the exclusion 
of what
seem to be scientificially relevant other topics is BOTH maddening and 
disheartening.

CAS



More information about the Bioconductor mailing list