[BioC] Cutoff to use for IQR filtering in genefilter
Sim, Fraser
Fraser_Sim at URMC.Rochester.edu
Mon Jun 23 19:06:45 CEST 2008
Hi Mark,
Am I right in the interpretation that using the median cutoff of the
distribution of IQRs would remove 50% of the genes in every analysis.
As below:
eset <- readAffy()
IQRs <- esApply(eset,1,IQR)
f1 <- function(x) ( IQR(x) > median(IQRs) )
selected <- genefilter(eset, f1)
What happens if more than 50% of genes are variable or for that matter
less than 50%? Should one plot the IQRs against some value of interest,
e.g. t-test statistic and determine the IQR cut-off on that basis?
Thanks, Fraser
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Mark Cowley
Sent: Sunday, June 22, 2008 7:32 PM
To: swhwang10 at yahoo.com
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Cutoff to use for IQR filtering in genefilter
Hi Seungwoo,
The range/IQR/SE/SD of your data is dependent on a number of factors,
including biological variability, and other sources of technical
variability, which can include the type of normalisation algorithm
(think RMA vs MAS5).
Basically, applying a filter on IQR of 0.1 in my study might remove
half the genes, whereas in your study it may remove 10% of them.
Suggestions such as Robert's are useful because they use the IQR of
YOUR data in order to set that cutoff.
I suggest caculating the IQR's for all of your genes, and then either
plotting them plot(density(IQRs)) or just try summary( IQRs ) which
will give you a good feel for just how variable your data is.
If you need help calculating the IQR's and/or variances of your genes,
please post back to the list.
cheers,
Mark
On 22/06/2008, at 9:05 PM, Seungwoo Hwang wrote:
> I am wondering what cutoff value I should use for IQR filtering in
> genefilter. I did some literature search. It varies from paper to
> paper. I have read two papers so far. One used 0.5, the other used
> 0.18. affylmGUI has an option of 0.5, 0.25, and 0.1.
>
> I also searched Bioconductor archive and read that Dr. Robert
> Gentleman suggested to filter out the genes whose IQR below median,
> not for some fixed value.
>
> I have two questions on this vein.
>
> (1) How small is a gene's variance (in terms of number) if its IQR
> is some value, say, 0.5 or 0.1? Can I calculate it?
> (2) When median is used instead of fixed number, wouldn't it be too
> large, since median of a gene's expression intensities across
> samples can be anything?
>
> Thanks,
>
> Seungwoo
> ------------------------------------
> Seungwoo Hwang, Ph.D.
> Senior Research Scientist
> Korean Bioinformation Center
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
----------------------------------------------------------------------
Mark Cowley, BSc (Bioinformatics)(Hons)
Peter Wills Bioinformatics Centre
Garvan Institute of Medical Research
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list