[BioC] Question on filtering in the Category package
James W. MacDonald
jmacdon at med.umich.edu
Tue Aug 14 16:04:31 CEST 2007
Hi Boel,
Boel Brynedal wrote:
> Dear list,
>
> I have a theoretical question regarding Filtering on Variation in the
> Category package. I've performed an analysis that closely resembles the
> Vignette, but I am still a bit uncertain about the filtering.
> In the Vignette the following code is used:
> lowQ<-rowQ(eset,floor(0.25*NumArrays))
> upQ<-rowQ(eset,ceiling(0.75*NumArrays))
> iqrs<-upQ-lowQ
> select<-(upQ-lowQ)>0.5
>
> My question is, why is this filtering necessary? I have performed my
> analysis without filtering, and the results where strange.
> My guess is that this filtering is intended to eliminate the probe-sets
> that aren't expressed at all (and would cause category's containing them
> to be associated). But the reason for eliminating the probe-sets with
> the highest variability is less clear for me. Would these include probe-
> sets where something has gone wrong, or probe-sets that are not
> expressed at all in some, but not all, arrays?
> What have I missed?
I think you misunderstand the filtering being done here. This doesn't
remove probesets with variance greater than the 75th percentile.
Instead, it selects probesets with an inter-quartile range greater than 0.5.
This is a non-parametric estimate of the variance for each probeset, and
won't be adversely affected by outliers (unless you have lots of them,
in which case they really aren't outliers ;-D).
This is a pretty reasonable way to filter probesets, as it protects
against a single outlier making it look like there is a lot of
variability in the expression values.
Best,
Jim
>
> What kind of filtering are you using, and why?
>
> Is there an article out there discussing the variability, and cause of
> the variability, on arrays?
>
> Any comments would be helpful.
> Thank you!
>
> Best,
> Boel Brynedal
>
>
> --~*~**~***~*~***~**~*~--
> Boel Brynedal, MSc, PhD student
> Karolinska Institutet
> Department of Clinical neuroscience
>
> Karolinska University hospital Huddinge
> Division of Neurology, R54
> 141 86 Stockholm
> SWEDEN
> Phone: +46 8 585 819 27
> Fax: +46 8 585 870 80
> E-mail: boel.brynedal at ki.se
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list