[BioC] Cutoff to use for IQR filtering in genefilter

Sim, Fraser Fraser_Sim at URMC.Rochester.edu
Mon Jun 23 19:06:45 CEST 2008


Hi Mark,

Am I right in the interpretation that using the median cutoff of the
distribution of IQRs would remove 50% of the genes in every analysis. 

As below:

eset <- readAffy()
IQRs <- esApply(eset,1,IQR)
f1 <- function(x) ( IQR(x) > median(IQRs) )
selected <- genefilter(eset, f1) 

What happens if more than 50% of genes are variable or for that matter
less than 50%? Should one plot the IQRs against some value of interest,
e.g. t-test statistic and determine the IQR cut-off on that basis?

Thanks, Fraser

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mark Cowley
Sent: Sunday, June 22, 2008 7:32 PM
To: swhwang10 at yahoo.com
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Cutoff to use for IQR filtering in genefilter

Hi Seungwoo,
The range/IQR/SE/SD of your data is dependent on a number of factors,  
including biological variability, and other sources of technical  
variability, which can include the type of normalisation algorithm  
(think RMA vs MAS5).
Basically, applying a filter on IQR of 0.1 in my study might remove  
half the genes, whereas in your study it may remove 10% of them.
Suggestions such as Robert's are useful because they use the IQR of  
YOUR data in order to set that cutoff.

I suggest caculating the IQR's for all of your genes, and then either  
plotting them plot(density(IQRs)) or just try summary( IQRs ) which  
will give you a good feel for just how variable your data is.

If you need help calculating the IQR's and/or variances of your genes,  
please post back to the list.

cheers,
Mark

On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote:

> I am wondering what cutoff value I should use for IQR filtering in  
> genefilter. I did some literature search. It varies from paper to  
> paper. I have read two papers so far. One used 0.5, the other used  
> 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1.
>
> I also searched Bioconductor archive and read that Dr. Robert  
> Gentleman suggested to filter out the genes whose IQR below median,  
> not for some fixed value.
>
> I have two questions on this vein.
>
> (1) How small is a gene's variance (in terms of number) if its IQR  
> is some value, say, 0.5 or 0.1? Can I calculate it?
> (2) When median is used instead of fixed number, wouldn't it be too  
> large, since median of a gene's expression intensities across  
> samples can be anything?
>
> Thanks,
>
> Seungwoo
> ------------------------------------
> Seungwoo Hwang, Ph.D.
> Senior Research Scientist
> Korean Bioinformation Center
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

----------------------------------------------------------------------
Mark Cowley, BSc (Bioinformatics)(Hons)

Peter Wills Bioinformatics Centre
Garvan Institute of Medical Research

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list