[BioC] edgeR dataset filtering using pnas_expression.txt
Dave Tang
davetingpongtang at gmail.com
Wed Jan 4 15:04:01 CET 2012
Hi list,
Just a question regarding edgeR and dataset processing/filtering prior to
calling differential expression.
Case Study 12 (RNA-seq of Hormone-Treated LNCaP Cells) from the edgeR
manual mentions that:
"We filter out lowly expressed tags and those which are only expressed in
a small number of samples. We keep only those tags that have at least one
count per million in at least three samples."
Then in section 6 of the manual it mentions that:
"The edgeR methodology needs to work with the original digital expression
counts, so these should not be transformed in any way by users prior to
analysis. edgeR automatically takes into account the total size (total
read number) of each library in all calculations of fold-changes,
concentration and statistical significance."
My question is whether filtering counts as "transforming" the data. Since
this would affect the total size of each library and thus affecting all
downstream calculations, is it OK to use such filters? And what should one
be cautious about when applying such filters e.g. at least n tags in n
samples, prior to performing the edgeR analysis?
Many thanks,
--
Dave
More information about the Bioconductor
mailing list