[BioC] Cutoff to use for IQR filtering in genefilter

Mark Cowley m.cowley0 at gmail.com
Tue Jun 24 02:07:48 CEST 2008


Hi Fraser,
that's exactly right, using the median IQR as the filter will remove  
50% of your data every time.
Other alternatives could be the 20th percentile of the IQR's as your  
filter to remove the least variable 20%.

Since all of the IQR's make up a distribution of numbers, there will  
always be a median of that distribution. I think that the question  
you're asking is: what if the median IQR is still not variable enough  
in a biological context, or in a system with large changes, perhaps a  
median IQR filter would remove too many genes that have large  
variability.
That would be where plotting the data, perhaps against the t-tests as  
you have suggested would be a good means of choosing the best filter.  
perhaps IQR vs average expression level, or IQR vs standard deviation  
might also help?

Incidentally, I rarely use a variability filter, I rely on the  
statistics with FDR < 5%, and accept that some of these will be due to  
genes with small, but consistent differences.

cheers,
Mark

On 24/06/2008, at 3:06 AM, Sim, Fraser wrote:

> Hi Mark,
>
> Am I right in the interpretation that using the median cutoff of the
> distribution of IQRs would remove 50% of the genes in every analysis.
>
> As below:
>
> eset <- readAffy()
> IQRs <- esApply(eset,1,IQR)
> f1 <- function(x) ( IQR(x) > median(IQRs) )
> selected <- genefilter(eset, f1)
>
> What happens if more than 50% of genes are variable or for that matter
> less than 50%? Should one plot the IQRs against some value of  
> interest,
> e.g. t-test statistic and determine the IQR cut-off on that basis?
>
> Thanks, Fraser
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mark  
> Cowley
> Sent: Sunday, June 22, 2008 7:32 PM
> To: swhwang10 at yahoo.com
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Cutoff to use for IQR filtering in genefilter
>
> Hi Seungwoo,
> The range/IQR/SE/SD of your data is dependent on a number of factors,
> including biological variability, and other sources of technical
> variability, which can include the type of normalisation algorithm
> (think RMA vs MAS5).
> Basically, applying a filter on IQR of 0.1 in my study might remove
> half the genes, whereas in your study it may remove 10% of them.
> Suggestions such as Robert's are useful because they use the IQR of
> YOUR data in order to set that cutoff.
>
> I suggest caculating the IQR's for all of your genes, and then either
> plotting them plot(density(IQRs)) or just try summary( IQRs ) which
> will give you a good feel for just how variable your data is.
>
> If you need help calculating the IQR's and/or variances of your genes,
> please post back to the list.
>
> cheers,
> Mark
>
> On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote:
>
>> I am wondering what cutoff value I should use for IQR filtering in
>> genefilter. I did some literature search. It varies from paper to
>> paper. I have read two papers so far. One used 0.5, the other used
>> 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1.
>>
>> I also searched Bioconductor archive and read that Dr. Robert
>> Gentleman suggested to filter out the genes whose IQR below median,
>> not for some fixed value.
>>
>> I have two questions on this vein.
>>
>> (1) How small is a gene's variance (in terms of number) if its IQR
>> is some value, say, 0.5 or 0.1? Can I calculate it?
>> (2) When median is used instead of fixed number, wouldn't it be too
>> large, since median of a gene's expression intensities across
>> samples can be anything?
>>
>> Thanks,
>>
>> Seungwoo
>> ------------------------------------
>> Seungwoo Hwang, Ph.D.
>> Senior Research Scientist
>> Korean Bioinformation Center
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ----------------------------------------------------------------------
> Mark Cowley, BSc (Bioinformatics)(Hons)
>
> Peter Wills Bioinformatics Centre
> Garvan Institute of Medical Research
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list