[BioC] total count filter cutoff

Wolfgang Huber whuber at embl.de
Mon May 5 23:33:56 CEST 2014


This thread has accumulated a good number of opinions and speculations what the best filter criterion and cutoff value is.
The “genefilter” vignette (I mentioned it previously) "Diagnostics for independent filtering” [1] provides rational criteria for deciding in a data-dependent manner.

Kind regards
	Wolfgang

[1] http://bioconductor.org/packages/release/bioc/html/genefilter.html

On 30 Apr 2014, at 23:25, Steve Lianoglou <lianoglou.steve at gene.com> wrote:

> Hi,
> 
> On Wed, Apr 30, 2014 at 1:11 PM, Ryan C. Thompson <rct at thompsonclan.org> wrote:
>> Filtering on raw counts has a statistical motivation, i.e. something like
>> "we can't do statistics with less than X reads". Filtering on CPM is
>> sometimes just used as a proxy for count-based filtering, but sometimes it
>> also has a biological motivation, i.e. "we believe that CPM < X represents
>> biological noise transcription rather than genuine regulated transcription
>> relevant to the biological system in question". So you have to consider what
>> your goals are for filtering and choose an appropriate method.
> 
> Even still, in the "biological motivation" case: if you want to use
> CPM, shouldn't you really prefer {R|F}PKM so you don't "enrich" for
> removal of lowly expressed short transcripts while letting lowly
> expressed long transcripts slip through?
> 
> -steve
> 
> -- 
> Steve Lianoglou
> Computational Biologist
> Genentech
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list