[BioC] DESeq and EdgeR log fold differences
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Nov 30 03:15:53 CET 2012
Dear Ioannis,
Questions like this require reproducible code examples that other people
can run for themselves to confirm the results. See the posting guide.
As it is, the inconsistencies in the three sets of output you give
(samples in different orders, groups with different names, same miR with
different IDs, logFC of different signs, etc) do not give a reader any
confidence that the analyses are actually comparable. According to the
DESeq output you give, this particular miR is much lower in conditionA
(wildtype) than in conditionB (knockout), but the raw data you give show
very clearly that the opposite is true.
It may be that you have simply given different data to the two packages or
else have aligned the results incorrectly.
Best wishes
Gordon
---------------- original message -----------------
[BioC] DESeq and EdgeR log fold differences
Ioannis Vlachos iv at on.gr
Thu Nov 29 23:06:17 CET 2012
Hello everyone,
I thought of conducting a parallel DE analysis with EdgeR and DESeq using
a dataset that I have been working on lately.
The dataset has two conditions with two biological replicates each.
Let's say: Wild Type, Wild Type, Knock Out, Knock Out.
It's a smallRNA-Seq dataset, mapped to miRNAs.
I have tried various analyses using both programs and I have noticed this.
There are very large differences in fold changes for some miRNAs between
the two programs, even when using "RLE" for EdgeR normalization.
Example:
DESeq Code:
countDataSet = newCountDataSet (DATA, condition)
countDataSet = estimateSizeFactors(countDataSet)
countDataSet = estimateDispersions(countDataSet)
difexp = nbinomTest (countDataSet, "WildType", "KnockOut")
one of the results is:
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
100 623.8597966 349.3576527 898.3619406 2.57146776
1.362592066 0.001303353 0.182310802
And the size factors for DESeq are:
sizeFactors(countDataSet)
KO1 KO2 WT1 WT2
1.2969960 1.052 0.8850 0.84442
OK. So far so good.
EdgeR now.
dge <- DGEList(counts=DATA, group=condition)
dge<- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, verbose=TRUE)
dge <-estimateTagwiseDisp(dge, verbose=TRUE)
et<- exactTest(dge)
Which results in:
logFC logCPM PValue FDR
100 -2.750 5.814103 9.40E-09 1.20E-05
With:
dge$samples
group lib.size norm.factors
WT1 1 2796302 0.9922204
WT2 1 2610244 0.9928183
KO1 2 3999488 1.0248098
KO2 2 3349646 0.9905555
We have logFC 1.3 for DESeq and 2.75 in EdgeR
And these results remain practically the same even by using:
dge<- calcNormFactors(dge.RLE, method="RLE")
logFC logCPM PValue FDR
210 -2.775952 5.823856 8.047631e-09 1.030902e-05
group lib.size norm.factors
KnockOut 3999488 1.0147156
KnockOut 3349646 0.9830163
WildType 2796302 0.9903844
WildType 2610244 1.0122579
Any thoughts?
This entry has (raw tags):
KO KO WT WT
131 123 287 195
Any thoughts on why I get 1.3 lFC vs 2.7lFC?
Thanks a lot,
Best Regards,
Ioannis
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list