[BioC] Machine learning

Wolfgang Huber huber at ebi.ac.uk
Fri Mar 2 09:56:36 CET 2007


Dear Weiyin,

The code you mention is just to filter out genes with little variation, 
since they are unlikely to be helpful for the classification task but 
tend to aggravate the 'curse of high-dimensionality' problem.

You might also consider a more recent lab, from the 2006 course in 
Brixen, with nicer explanations, code etc.:
http://www.economia.unimi.it/projects/marray/2006/material/Lab3/MachineLearning/
(progress happens!)

More answers below.

> I am trying to do classification on 92 Affymetrix data-set using Random
> Forest.   I followed the examples on "Machine Learning Lab" June 11,
> 2004.  I have problem to understand some of the codes for the
> non-specific filtering step.  
> 
>  
> 
> Here are the codes in the paper:
> 
>  
> 
> *       library(genefilter)
> 
> *       f1 <- pOverA(0.25, log2(200))
> 
> *       f2 <- function(x) (IQR(x) > 0.5)
> 
> *       ff <- filterfun(f1,f2)
> 
> *       selected <- genefilter(eset, ff)
> 
> *       sum(selected)
> 
>  
> 
>  
> 
> For "pOverA(0.25, log2(200))", is this means if 25% of samples'
> expression values from same gene > log2(200), then it return true?

Indeed.

> For "f2 <- function(x) (IQR(x) > 0.5)", I tried the help page, but still
> don't understand.  I assume it have something to do with filter genes
> show little variation across samples.  
> 

f2 returns TRUE if the interquartile range of x (the difference between 
75% and 25% percentile) is > 0.5

> 
> For "ff <- filterfun(f1,f2)", the help page said "the function returns
> FALSE when the first filter function returns FALSE otherwise it return
> TRUE".  So why we need f2 if is decided by first function, which is f1
> here?

I think the man page needs some help here. Both f1 and f2 need to return 
TRUE if the gene is to be selected. I will poke its maintainer.

> Could someone explain this for me?



Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber



More information about the Bioconductor mailing list