[BioC] 2 issues about enriched gene sets via Roast
julie.leonard at syngenta.com
julie.leonard at syngenta.com
Fri Jan 24 16:19:35 CET 2014
Same questions minus the unformatted table data.
Hi. I am using Roast to perform gene set enrichment analysis after doing differential expression analysis in edgeR. In this particular study, I had 2 variables which I joined to create a single factor in the linear model: y ~ 0 + combo_variable. Looking at the variance in the data, most of the variance was due to 1 variable and there was very little variance due to the other variable. Thus, large numbers of genes (~10,000) were found to be differentially expressed when testing the contrast for one variable and very few genes (~200) were found to be differentially expressed when testing the contrast for the other variable. When I ran roast for each of these 2 contrasts, the one that had lots of differentially expressed genes found almost all of the gene sets to be enriched. This is understandable since there were lots of genes differentially expressed, but my problem is that most of the gene sets had the same FDRs. Thus I can't even narrow down the list of enriched gene sets by using a more stringent FDR cutoff. Why would all of these gene sets have the same p-values and thus the same FDRs??
On the other hand, when I ran roast for the contrast with few genes differentially expressed, I got few gene sets enriched. But what's odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but none of the genes in the gene set were differentially expressed. Are these enriched gene sets false positives? I'm not sure what's going on here.
Please advise.
Thanks,
Julie
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of julie.leonard at syngenta.com
Sent: Thursday, January 23, 2014 5:28 PM
To: bioconductor at r-project.org
Subject: [BioC] 2 issues about enriched gene sets via Roast
Hi. I am using Roast to perform gene set enrichment analysis after doing differential expression analysis in edgeR. In this particular study, I had 2 variables which I joined to create a single factor in the linear model: y ~ 0 + combo_variable. Looking at the variance in the data, most of the variance was due to 1 variable and there was very little variance due to the other variable. Thus, large numbers of genes (~10,000) were found to be differentially expressed when testing the contrast for one variable and very few genes (~200) were found to be differentially expressed when testing the contrast for the other variable. When I ran roast for each of these 2 contrasts, the one that had lots of differentially expressed genes found almost all of the gene sets to be enriched. This is understandable since there were lots of genes differentially expressed, but my problem is that most of the gene sets had the same FDRs. Thus I can't even narrow down the list of enriched!
gene sets by using a more stringent FDR cutoff. A subset of the output is shown below. Why would all of these gene sets have the same p-values and thus the same FDRs??
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
112
0.455357
0.205357
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
38
0.578947
0.210526
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
10
0.2
0.4
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
311
0.299035
0.469453
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
540
0.233333
0.344444
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
1294
0.328439
0.257342
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
317
0.29653
0.533123
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
538
0.421933
0.256506
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
133
0.511278
0.293233
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
39
0.589744
0.205128
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
14
0.214286
0.571429
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
13
0.307692
0.538462
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
36
0.472222
0.222222
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
616
0.160714
0.688312
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
6
1
0
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
21
0.428571
0.285714
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
65
0.415385
0.246154
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
99
0.383838
0.323232
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
19
0
0.578947
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
118
0.5
0.313559
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
470
0.461702
0.323404
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
1401
0.404711
0.250535
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
631
0.272583
0.369255
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
55
0.236364
0.472727
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
5
0.2
0.6
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
On the other hand, when I ran roast for the contrast with few genes differentially expressed, I got few gene sets enriched. But what's odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but none of the genes in the gene set were differentially expressed. Are these enriched gene sets false positives? I'm not sure what's going on here.
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
# DE genes
18
0.333333
0
Down
2.00E-04
0.01636
4.00E-04
0.024994
0
5
0.4
0
Down
2.00E-04
0.01636
5.00E-04
0.024994
0
7
0.714286
0
Down
6.00E-04
0.045444
6.00E-04
0.024994
0
50
0.08
0.32
Up
0.013
0.159882
0.001
0.032379
0
49
0.346939
0.142857
Down
0.049
0.272132
0.001
0.032379
0
4
0.25
0.25
Down
0.1722
0.457071
0.0012
0.037628
0
Please advise.
Thanks,
Julie
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
This message may contain confidential information. If yo...{{dropped:17}}
More information about the Bioconductor
mailing list