[BioC] (no subject)
Liu, XiaoChuan
xiaochuan.liu at mssm.edu
Fri Mar 8 16:54:55 CET 2013
Dear Dr. Simon Anders,
I have some questions on the multi-factor test in DESeq.
Here I have total 8 RNA-seq samples. There are two factors: condition(Wildtype and Knockout), and treatment(Picrotoxin and Control). And each has two biological replicates. Details I have attached (see Table 1 in attachment).
Now I want to use the generalized linear models to do the ANOVA-like analysis for these two factors and want to find the differentially expressed genes caused by the two factors. The part 2 in attachment are the different GLMs tests by DESeq. And the cut-off are chosen as P-value<0.05 and FDR<0.05 separately. The number of significant differentially expressed genes are counted by cut-offs. Details see Part 2 in attachment.
I also pick up raw counts for all the genes, then I try to use the function getVarianceStabilizedData() to transfer the raw counts to variance stabilizing transformation (VST). After that, I directly use the ANOVA function in R to calculate the P-value for each genes based on the VST value. When I get all P-value for all genes, I also do the p.adjust to calculate the FDR by method = "BH". Results see Part 3 in attachment.
Here are my questions:
1. What is the meaning when I use “count ~ 1”? Here 1 is a cut-off? Or other meaning? I saw you give an example like this in DESeq Reference Manual. So I try to follow using it. But I do not know the meaning for test.
2. In Part 2, I did 6 different GLMs tests by DESeq. And I also do the overlap amongst results. The overlap sometimes is very small. Why are they so different? How to explain it? Could you give me some comments?
3. In Part 3, I also do the overlap with 6 results in Part 2. But the overlap are very small. I wonder if I make a mistake to misuse the variance stabilizing transformation? If I want to directly use the ANOVA function in R to calculate co-factor P-value, could I use the raw count? Or How to normalize the raw counts then I can use ANOVA function in R?
Thank you very much!
Looking forward to your response!
Best,
Leo
1. Table1
condition treatment
WP1 Wildtype Picrotoxin
WC1 Wildtype Control
KP1 Knockout Picrotoxin
KC1 Knockout Control
WP2 Wildtype Picrotoxin
WC2 Wildtype Control
KP2 Knockout Picrotoxin
KC2 Knockout Control
2. Multi-factor test
fit1 = fitNbinomGLMs( cds, count ~ treatment + condition )
fit0 = fitNbinomGLMs( cds, count ~ treatment )
pvalsGLM1 = nbinomGLMTest( fit1, fit0 )
padjGLM1 = p.adjust( pvalsGLM1, method="BH" )
Dataset1
P-value<0.05: 615
FDR<0.05: 20
fit3 = fitNbinomGLMs( cds, count ~ condition + treatment )
fit2 = fitNbinomGLMs( cds, count ~ condition )
pvalsGLM2 = nbinomGLMTest( fit3, fit2 )
padjGLM2 = p.adjust( pvalsGLM2, method="BH" )
Dataset2
P-value<0.05: 1959
FDR<0.05: 870
fit1 = fitNbinomGLMs( cds, count ~ treatment + condition )
fit4 = fitNbinomGLMs( cds, count ~ 1 )
pvalsGLM3 = nbinomGLMTest( fit1, fit4 )
padjGLM3 = p.adjust( pvalsGLM3, method="BH" )
Dataset3
P-value<0.05: 1767
FDR<0.05: 778
fit5 = fitNbinomGLMs( cds, count ~ condition * treatment )
fit0 = fitNbinomGLMs( cds, count ~ treatment )
pvalsGLM4 = nbinomGLMTest( fit5, fit0 )
padjGLM4 = p.adjust( pvalsGLM4, method="BH" )
Dataset4
P-value<0.05: 393
FDR<0.05: 26
fit5 = fitNbinomGLMs( cds, count ~ condition * treatment )
fit2 = fitNbinomGLMs( cds, count ~ condition )
pvalsGLM5 = nbinomGLMTest( fit5, fit2 )
padjGLM5 = p.adjust( pvalsGLM5, method="BH" )
Dataset5
P-value<0.05: 1450
FDR<0.05: 665
fit5 = fitNbinomGLMs( cds, count ~ condition * treatment )
fit4 = fitNbinomGLMs( cds, count ~ 1 )
pvalsGLM6 = nbinomGLMTest( fit5, fit4 )
padjGLM6 = p.adjust( pvalsGLM6, method="BH" )
Dataset6
P-value<0.05: 1465
FDR<0.05: 657
Overlap(P-value<0.05)
Dataset1
Dataset2
Dataset3
Dataset4
Dataset5
Dataset6
Dataset1
Dataset2
240
Dataset3
455
1511
Dataset4
356
178
342
Dataset5
220
1423
1378
184
Dataset6
396
1282
1432
328
1247
Overlap(FDR<0.05)
Dataset1
Dataset2
Dataset3
Dataset4
Dataset5
Dataset6
Dataset1
Dataset2
10
Dataset3
20
720
Dataset4
18
15
26
Dataset5
9
658
652
15
Dataset6
20
618
653
26
602
3. Two way ANOVA After VST:
P-value<0.05: 671
FDR<0.05: 0
Overlap(P-value<0.05)
Dataset1
Dataset2
Dataset3
Dataset4
Dataset5
Dataset6
Two way ANOVA After VST
35
77
68
34
69
69
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Anova.pdf
Type: application/pdf
Size: 106447 bytes
Desc: Anova.pdf
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130308/e5c68473/attachment.pdf>
More information about the Bioconductor
mailing list