[BioC] The difference between three methods in calcNormFacotors() in edgeR
Zhan Tianyu
sewen67 at gmail.com
Fri Jul 4 15:32:07 CEST 2014
Hello all,
I have a question concerning the calcNormFacotrs() in edgeR. There
are three methods that I could choose from: "TMM", "RLE", and
"upperquartile". I am wondering how could decide which one to use?
For example, consider a simple example like this: there are 10 genes
in total, and 4 genes in two groups. Therefore, the counts data would be a
10*8 matrix, where each row is the gene, each column is the individual, and
the 1-4 columns are the first group, 5-8 columns are the second group.
Among the 10 genes, 60% genes are the differential genes: the counts of No.
3,4,5,6,8,9 in the first group are doubled, while others are the sample.
Please see the attachments for this count data.
Then I generated the "group" factor via this command:
> grp <- as.factor(rep(0:1, each = 8/2))
After that, I generated the DGEList by:
> d <- DGEList(counts = counts, group = grp )
Then I calculated the normalization factor by edgeR:
> n <- calcNormFactors(d)
By default, this function uses the "TMM" method. However, the
normalization factors look like this:
group lib.size norm.factors
Sample1 0 5062446 1.1195829383593
Sample2 0 5062340 0.8154739771400
Sample3 0 5062444 1.1195827474525
Sample4 0 5062466 1.1403164060313
Sample5 1 3000123 0.9624162935534
Sample6 1 2999992 0.9624163157255
Sample7 1 2999977 0.9624169648716
Sample8 1 3000156 0.9624160077253
I think it is weird, because normalization factors for individuals
1 and 2 are quite different (1.11958, and 0.81547). However, from the
counts data, their counts are generally the same (Please see the attachment
for counts data).
Then I tried the method of RLE method:
n <- calcNormFactors(d,method="RLE")
The results are:
$samples
group lib.size norm.factors
Sample1 0 5062446 1.0886765699045
Sample2 0 5062340 1.0886508565338
Sample3 0 5062444 1.0886766741626
Sample4 0 5062466 1.0886750099086
Sample5 1 3000123 0.9185446848068
Sample6 1 2999992 0.9185578680804
Sample7 1 2999977 0.9185624609049
Sample8 1 3000156 0.9185437155777
I think this time the results are more reasonable. My question is
how I decide which method to use? Why TMM gives a weird result?
Thank you.
Best regards,
sewen67
More information about the Bioconductor
mailing list