[BioC] gene filtering for limma lmFit

James W. MacDonald jmacdon at med.umich.edu
Fri Oct 16 18:45:37 CEST 2009


Hi Fred,

You are correct that including a bunch of unexpressed genes when 
adjusting for multiplicity will reduce power. However, with limma you 
don't want to remove the 'unexpressed' genes too early (this doesn't 
apply to 'bad' data, where the spots are demonstrably unreliable for 
some reason or another).

You have to remember that the eBayes() step adjusts the denominator of 
the t-statistic based on a prior variance estimate that is calculated 
from all the genes under consideration. If you filter out genes prior to 
this step you can bias this estimate.

So the recommended method is to perhaps remove demonstrably bad spots 
first, do the normalization, model fitting, etc, and then filter out 
those genes you consider unexpressed before doing the multiplicity 
adjustment.

Best,

Jim



Peng, Fred wrote:
>    Hello all,
>    I  have  a  question  while  I  am using the limma package to identify
>    differentially  expressed genes: should I perform gene filtering after
>    normalization to exclude genes that are likely unexpressed in the samples
>    before fitting the linear model. With my limited stats knowledge, I believe
>    the inclusion of 'unexpressed' genes may affect the BH mutliple testing
>    correction by unnecessarily increasing the number of genes being tested.
>    Previously when I performed global test (using the globaltest package) on
>    Affy data, however, I found that the gene filtering step had no noticeable
>    effect on the final P-value and therefore had not been required, so I wonder
>    if limma's capability to detect differentially expressed genes would be
>    affected by whether or not 'unexpressed' genes were filtered out.
>    Thanks very much in advance.
>    Fred Peng
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list