[BioC] genefilter vs limma - many probes filtered

Ryan rct at thompsonclan.org
Fri May 23 05:33:49 CEST 2014


Hi Marcin,

I believe that performing variance filtering is not compatible with the 
empirical Bayes methods employed in limma. The point of limma is to 
compute a moderated estimate of each gene's variance by using the 
average variance across all genes as a prior estimate. If you filter 
out genes based on their variance, then you will bias that prior 
estimate, and this bias will propagate to the posterior estimates. For 
example, if you filter out high-variance genes, limma will 
underestimate the prior variance, and overestimate the significance of 
your differential expression calls, which is not a desirable outcome.

It may possibly be defensible to perform variance filtering after the 
empirical Bayes step, but I'm not sure, and you would have to ask 
someone more knowledegable about such matters.

-Ryan

On Thu May 22 18:41:24 2014, Marcin Kaminski [guest] wrote:
> Dear list,
> I've followed the tips regarding gene filtering at http://www.bioconductor.org/packages/release/bioc/vignettes/genefilter/inst/doc/independent_filtering.pdf when analyzing GEO data (GSE48060). In this case most probes would pass the tests (for adj.p. < .05) if I filter out roughly 70% of them based on variance, which will triple the number of positives compared to not filtering at all. (related graphic: http://i.imgur.com/RuuvRIo.png)
> Should I be concerned about such extensive filtering? Does it affect further analysis with limma and introduce bias? If it's a problem, what are the available solutions or diagnostics?
>
> Thanks for your help!
>
> Best regards,
> Marcin
>
>
>   -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
> [5] LC_TIME=Polish_Poland.1250
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] RColorBrewer_1.0-5    hgu133plus2.db_2.14.0 org.Hs.eg.db_2.14.0   RSQLite_0.11.4        DBI_0.2-7             AnnotationDbi_1.26.0
>   [7] GenomeInfoDb_1.0.2    genefilter_1.46.1     matrixStats_0.8.14    limma_3.20.3          GEOquery_2.30.0       Biobase_2.24.0
> [13] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.42.0   IRanges_1.22.6    R.methodsS3_1.6.1 RCurl_1.95-4.1    splines_3.1.0     stats4_3.1.0      survival_2.37-7   tools_3.1.0
>   [9] XML_3.98-1.1      xtable_1.7-3
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list