[BioC] Selecting genes for machine learning
Djork-Arné Clevert
okko at clevert.de
Mon Jun 27 11:22:56 CEST 2011
Dear January,
if you have Affymetrix data you could try to filter genes by their
information content. You can find the Bioinformatics publication
here:
"I/NI-calls for the exclusion of non-informative genes: a highly effective
filtering tool for microarray data"
at http://bioinformatics.oxfordjournals.org/content/23/21/2897.full.
The I/NI filter is included in our farms package, which is according to the
Affycomp benchmark the leading summarization method with respect to
sensitivity and specificity.
Greetings from Berlin,
Okko
--
dipl.-inf. djork clevert | gleimstr. 13a | d-10437 berlin
e: okko at clevert.de
p: +49.30.4432 4702
f: +49.30.6883 5307
Am 24.06.2011 um 16:27 schrieb January Weiner:
> Dear all,
>
> what is currently regarded as the optimal strategy to select genes for
> machine learning analysis? Taking all of the 40k or so genes is not
> doable (at least with randomForest, which I use). "Bioconductor case
> studies" suggests using nsFilter with argument var.cutoff=0.75,
> however I am not sure how that is calculated. Are the genes sorted
> according to absolute variance? If yes, is that method really suitable
> for filtering "uninteresting" genes?
>
> Kind regards,
>
> January
>
> --
> -------- Dr. January Weiner 3 --------------------------------------
> Max Planck Institute for Infection Biology
> Charitéplatz 1
> D-10117 Berlin, Germany
> Web : www.mpiib-berlin.mpg.de
> Tel : +49-30-28460514
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list