[BioC] total count filter cutoff

Mark Robinson mark.robinson at imls.uzh.ch
Wed Apr 30 21:34:50 CEST 2014


In my lab, we typically follow a "CPM of at least X in at least Y samples" rule, where X=1 (arbitrary but reasonable, can be changed) and Y=size of smallest replicate group, according to one of the case studies in the user's guide, for example:

------
4.3.6 Filtering
We filter out very lowly expressed tags, keeping genes that are expressed at a reasonable level in at least one treatment condition. Since the smallest group size is three, we keep genes that achieve at least one count per million (cpm) in at least three samples:

> keep <- rowSums(cpm(y)>1) >= 3
> y <- y[keep,]
------

(http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)

Cheers, Mark


----------
Prof. Dr. Mark Robinson
Statistical Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://ow.ly/riRea







On 30.04.2014, at 21:23, "Ryan C. Thompson" <rct at thompsonclan.org> wrote:

> Dear Mahnaz,
> 
> Total count filtering and mean count filtering are equivalent, since the only difference is a constant factor (dividing by number of samples), so the mean count filter demonstrated in the genefilter vignette corresponds to your question.
> 
> If you are expecting the vignette to simply give you a specific number to use a as a cutoff, that's not possible, because the threshold depends on the data. I suggest that you adapt the R code in this vignette to your data in order to choose an appropriate cutoff.
> 
> -Ryan
> 
> On Wed 30 Apr 2014 12:04:33 PM PDT, Mahnaz Kiani wrote:
>> Thanks for quick response, I did check that but didn't find any information
>> about total count filter cutoff, would you please help me with that.
>> 
>> Thanks,
>> Mahnaz
>> 
>> 
>> On Wed, Apr 30, 2014 at 1:47 PM, Wolfgang Huber <whuber at embl.de> wrote:
>> 
>>> Dear Mahnaz
>>> http://bioconductor.org/packages/release/bioc/html/genefilter.html ->
>>> Diagnostics for independent filtering -> Section 4 provides some options.
>>>         Wolfgang
>>> 
>>> Il giorno 30 Apr 2014, alle ore 20:29, mahnaz Kiani [guest] <
>>> guest at bioconductor.org> ha scritto:
>>> 
>>>> 
>>>> I'm using edgeR for analysis of may data and I'm not sure what total
>>> count filter value cutoff value I should use, My reads are paired 50bP
>>> reads and total reads per sample is about 80,000,000. I tried cutoff values
>>> of 5,10,15,30,50 and 100 and I only saw differences between 50 and 100 but
>>> still looking for logical reason to chose the cutoff value.
>>>> 
>>>> Appreciate your help,
>>>> Mahnaz
>>>> 
>>>> -- output of sessionInfo():
>>>> 
>>>> R 3.0.2
>>>> 
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list