[BioC] Understanding limma, fdr and topTable
Jenny Drnevich
drnevich at illinois.edu
Tue Jul 8 19:17:13 CEST 2008
HI all,
My views on filter based on variance run more towards Aaron's. We had
another conversation about this recently on the list
(https://stat.ethz.ch/pipermail/bioconductor/2008-June/022941.html).
Robert Gentleman asked me if I had any evidence to support my
suspicions; I've seen some cases where even throwing "Absent" probe
sets affects the eBayes calculation of the p-value so that they are
larger than when all probe sets are used. I tried a couple of years
ago to really investigate this, but I couldn't find a good way to
adequately generate microarray data with known numbers of DE genes.
Does anybody know of a good microarray data simulator that gives data
that looks like real data? If so, I could do some playing around with
simulations and present a poster at BioC that we can hack apart :)
Cheers,
Jenny
At 10:17 AM 7/8/2008, aaron.j.mackey at gsk.com wrote:
> > I would add that removing those genes that are unchanged in any sample
> > will also help reduce the multiplicity problem. Regardless of the
> > expression level, those genes that never change expression are
> > uninteresting by default, so e.g., if beta-actin is highly expressed at
> > the same level in all samples we don't really care to test for
> > differential expression for that gene since it apparently is not
> > differentially expressed.
>
>This doesn't make sense. How can I choose to filter out "unchanged"
>probesets without fitting a model of some sort, and making a probabilistic
>decision for each probeset about whether it is "unchanged" or not. Every
>probeset (save those below the detection limit) will exhibit variance
>(though the variance may be below the precision of the instrument to
>measure), right? You're not suggesting that there are some probesets with
>zero variance?
>
>It seems to me that this approach leads to a false/erroneous reduction in
>the multiplicity problem, as you've just moved the hypothesis testing into
>a separate "phase" of the analysis. And it also would mess up pooled
>variance estimates such as those used in eBayes-based methods (e.g.
>limma).
>
>So, while I might be willing to filter out known "dead" probesets (that I
>never see above detection threshold over many hundreds of assays), I'm in
>the camp that the statistics are corrupt if you filter without regard to
>its affect on multiplicity corrections.
>
>As an aside, it should be possible to fit some of the models using
>truncated/censored distributions (wherein the statistical model gets to
>know that there were X number of probesets with values < threshold, but
>doesn't pretend that those values are real). That's an idea for the model
>developers to ponder ...
>
>-Aaron
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
More information about the Bioconductor
mailing list