[BioC] 2 issues about enriched gene sets via Roast

julie.leonard at syngenta.com julie.leonard at syngenta.com
Fri Jan 24 16:19:35 CET 2014


Same questions minus the unformatted table data.

Hi.   I am using Roast to perform gene set enrichment analysis after doing differential expression analysis in edgeR.  In this particular study, I had 2 variables which I joined to create a single factor in the linear model:  y ~ 0 + combo_variable.  Looking at the variance in the data, most of the variance was due to 1 variable and there was very little variance due to the other variable.   Thus, large numbers of genes (~10,000) were found to be differentially expressed when testing the contrast for one variable and very few genes (~200) were found to be differentially expressed when testing the contrast for the other variable.   When I ran roast for each of these 2 contrasts, the one that had lots of differentially expressed genes found almost all of the gene sets to be enriched.  This is understandable since there were lots of genes differentially expressed, but my problem is that most of the gene sets had the same FDRs.  Thus I can't even narrow down the list of enriched gene sets by using a more stringent FDR cutoff.  Why would all of these gene sets have the same p-values and thus the same FDRs??

On the other hand, when I ran roast for the contrast with few genes differentially expressed, I got few gene sets enriched.  But what's odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but none of the genes in the gene set were differentially expressed.  Are these enriched gene sets false positives?  I'm not sure what's going on here.

Please advise.

Thanks,
Julie

Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of julie.leonard at syngenta.com
Sent: Thursday, January 23, 2014 5:28 PM
To: bioconductor at r-project.org
Subject: [BioC] 2 issues about enriched gene sets via Roast

Hi.   I am using Roast to perform gene set enrichment analysis after doing differential expression analysis in edgeR.  In this particular study, I had 2 variables which I joined to create a single factor in the linear model:  y ~ 0 + combo_variable.  Looking at the variance in the data, most of the variance was due to 1 variable and there was very little variance due to the other variable.   Thus, large numbers of genes (~10,000) were found to be differentially expressed when testing the contrast for one variable and very few genes (~200) were found to be differentially expressed when testing the contrast for the other variable.   When I ran roast for each of these 2 contrasts, the one that had lots of differentially expressed genes found almost all of the gene sets to be enriched.  This is understandable since there were lots of genes differentially expressed, but my problem is that most of the gene sets had the same FDRs.  Thus I can't even narrow down the list of enriched!
  gene sets by using a more stringent FDR cutoff.  A subset of the output is shown below.   Why would all of these gene sets have the same p-values and thus the same FDRs??

NGenes

PropDown

PropUp

Direction

PValue

FDR

PValue.Mixed

FDR.Mixed

112

0.455357

0.205357

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

38

0.578947

0.210526

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

10

0.2

0.4

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

311

0.299035

0.469453

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

540

0.233333

0.344444

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

1294

0.328439

0.257342

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

317

0.29653

0.533123

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

538

0.421933

0.256506

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

133

0.511278

0.293233

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

39

0.589744

0.205128

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

14

0.214286

0.571429

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

13

0.307692

0.538462

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

36

0.472222

0.222222

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

616

0.160714

0.688312

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

6

1

0

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

21

0.428571

0.285714

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

65

0.415385

0.246154

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

99

0.383838

0.323232

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

19

0

0.578947

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

118

0.5

0.313559

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

470

0.461702

0.323404

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

1401

0.404711

0.250535

Down

2.00E-04

0.0002

1.00E-04

5.50E-05

631

0.272583

0.369255

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

55

0.236364

0.472727

Up

2.00E-04

0.0002

1.00E-04

5.50E-05

5

0.2

0.6

Up

2.00E-04

0.0002

1.00E-04

5.50E-05



On the other hand, when I ran roast for the contrast with few genes differentially expressed, I got few gene sets enriched.  But what's odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but none of the genes in the gene set were differentially expressed.  Are these enriched gene sets false positives?  I'm not sure what's going on here.

NGenes

PropDown

PropUp

Direction

PValue

FDR

PValue.Mixed

FDR.Mixed

# DE genes

18

0.333333

0

Down

2.00E-04

0.01636

4.00E-04

0.024994

0

5

0.4

0

Down

2.00E-04

0.01636

5.00E-04

0.024994

0

7

0.714286

0

Down

6.00E-04

0.045444

6.00E-04

0.024994

0

50

0.08

0.32

Up

0.013

0.159882

0.001

0.032379

0

49

0.346939

0.142857

Down

0.049

0.272132

0.001

0.032379

0

4

0.25

0.25

Down

0.1722

0.457071

0.0012

0.037628

0


Please advise.

Thanks,
Julie


Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.




This message may contain confidential information. If yo...{{dropped:17}}



More information about the Bioconductor mailing list