[BioC] edgeR cpm filtering
    John [guest] 
    guest at bioconductor.org
       
    Mon Feb 11 17:54:54 CET 2013
    
    
  
All,
I am a new edgeR user. I have difficulty understanding the meaning of the âcpmâ function of edgeR package.  I mean I understand that each value is divided by the total library value, and then multiplied by 1,000,000. But why 1M? I donât understand what is the logic behind using 1M? is it 1M reads? Or bases? And why not 10M? or 1000? Any specific reason for using 1M?
Another issues that I have is that how can I enforce filtering the samples that have 0 reads in one group of samples, but very large number of reads in another groups? Here is an example:
Samples, Sample 1-replicate 1, Sample 1-replicate 2, Sample 2-replicate 1, Sample 2- replicate 2, Sample 3-replicate 1, Sample 3- replicate 2
Gene_X, 150,100, 270,320,0,0
I used:
d_DGEList  <- d_DGEList[rowSums(cpm_filtered > 5) > 2,]
But still Gene_X is not filtered. Many genes with low number of reads are filtered, but very few like Gene_X are still there. I think that having many reads mapped to samples 1 and 2 qualifies it for passing the cpm filtering. How can I filter genes like this? Is it OK if I manually delete cases like this?
Thank you.
John
 -- output of sessionInfo(): 
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] edgeR_2.6.0  limma_3.12.0
>
--
Sent via the guest posting facility at bioconductor.org.
    
    
More information about the Bioconductor
mailing list