[BioC] genefilter vs limma - many probes filtered
Ryan
rct at thompsonclan.org
Fri May 23 05:33:49 CEST 2014
Hi Marcin,
I believe that performing variance filtering is not compatible with the
empirical Bayes methods employed in limma. The point of limma is to
compute a moderated estimate of each gene's variance by using the
average variance across all genes as a prior estimate. If you filter
out genes based on their variance, then you will bias that prior
estimate, and this bias will propagate to the posterior estimates. For
example, if you filter out high-variance genes, limma will
underestimate the prior variance, and overestimate the significance of
your differential expression calls, which is not a desirable outcome.
It may possibly be defensible to perform variance filtering after the
empirical Bayes step, but I'm not sure, and you would have to ask
someone more knowledegable about such matters.
-Ryan
On Thu May 22 18:41:24 2014, Marcin Kaminski [guest] wrote:
> Dear list,
> I've followed the tips regarding gene filtering at http://www.bioconductor.org/packages/release/bioc/vignettes/genefilter/inst/doc/independent_filtering.pdf when analyzing GEO data (GSE48060). In this case most probes would pass the tests (for adj.p. < .05) if I filter out roughly 70% of them based on variance, which will triple the number of positives compared to not filtering at all. (related graphic: http://i.imgur.com/RuuvRIo.png)
> Should I be concerned about such extensive filtering? Does it affect further analysis with limma and introduce bias? If it's a problem, what are the available solutions or diagnostics?
>
> Thanks for your help!
>
> Best regards,
> Marcin
>
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
> [5] LC_TIME=Polish_Poland.1250
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] RColorBrewer_1.0-5 hgu133plus2.db_2.14.0 org.Hs.eg.db_2.14.0 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.26.0
> [7] GenomeInfoDb_1.0.2 genefilter_1.46.1 matrixStats_0.8.14 limma_3.20.3 GEOquery_2.30.0 Biobase_2.24.0
> [13] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.42.0 IRanges_1.22.6 R.methodsS3_1.6.1 RCurl_1.95-4.1 splines_3.1.0 stats4_3.1.0 survival_2.37-7 tools_3.1.0
> [9] XML_3.98-1.1 xtable_1.7-3
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list