[BioC] help in extracting limma results

Wed Jan 25 16:16:38 CET 2006

echang4 at life.uiuc.edu wrote:
> Hi Bioconductor users,
> I am having trouble understanding how multiple-testing adjustment is done
> in limma (specifically the decideTest).  I am really confused about the
> meaning between moderated F p-value and the adjusted p-value.
> 
> If I try two different "flavors" of decideTest (e.g. nestedF and global),
> I can see that the results are different.... but is it the p-value that is
> adjusted or the F-statistic is adjusted?

The statistics themselves are never adjusted in decideTests(), only the 
p-values.

The difference is in how the p-values are adjusted. For the 'global' 
option, all the contrasts are considered to be independent, and the 
p-values are adjusted as if you just had a bunch of independent t-tests.

The nestedF option is a bit more complicated. First, a bit of 
background. The F-statistic is used to determine if there are any 
differences between the samples, but it doesn't tell you which sample(s) 
are different. You have to fit contrasts to find out which sample(s) are 
different.

So the idea with the nestedF is to adjust the p-values associated with 
the F-test to find which genes are differentially expressed in at least 
one sample. Now we have a list of genes that are differentially 
expressed, but we don't know for which sample(s) that may be true. The 
t-statistics associated with the contrasts are then inspected and the 
largest one (in absolute value) is considered significant. Now, there 
may be other contrasts that are significant as well, so the largest 
t-statistic is set to the same absolute value as the second largest 
t-statistic, and the F-statistic is calculated again. If the F-statistic 
is still significant, the second largest contrast is considered 
significant. This procedure is continued until the F-statistic is no 
longer significant.

The basic reasoning here is that the largest t-statistic for a set of 
contrasts is significant if the overall F-statistic is significant. By 
following this step-wise procedure, we can determine which contrasts are 
contributing to the overall significance of the F-statistic.

> And how do I extract the gene lists (cutoff of FDR-adjusted p-value< 0.05)?

There are three ways that I know of. First, you can use write.fit(), 
which is part of limma. Something like:

write.fit(fit2, results2, "myoutputdata.txt", adjust="BH")

will output all your genes in a text file that you can then open up in 
e.g., Excel and filter based on the different columns. Note that the 
p-values associated with the contrasts will be adjusted, but the 
p-values associated with the F-statistics will not, so this is 
equivalent to decideTests() with method = "separate".

If you are using Affy chips, you could use limma2annaffy() in 
affycoretools. This will output either HTML or text tables (or both) 
containing the expression values, statistics, p-values and links to 
online databases. I don't know your platform, so I won't go into any 
more detail other than to say that affycoretools is in the devel 
repository, so you will either need R-2.3.0 if you want to use 
biocLite() to get it, or will have to download and install by hand if 
you have an earlier version of R.

A third option would be to extract the information you want and output 
it by 'hand'. This would involve a bit of work on your part. To fully 
explain what you need to do would take too long, so I will just give 
some pointers below.

> 
> I basically followed the limma user guide (which is superbly written,
> thank you!) But after completing the analysis, I am stuck trying to
> extract the gene lists out of R and into a csv file.
> 
> Can someone please help me out with the code which would extract the genes
> under "results2"? My code is as follows:
> 
> 
>>cont.matrix <- makeContrasts(
> 
> Ad.V_E2 = Ad.E2-Ad.veh,
> b20.V_E2 = b20.E2-b20.veh,
> Veh_modul = b20.veh-Ad.veh,
> E2_intrxn = (b20.E2-b20.veh)-(Ad.E2-Ad.veh), levels=design)
> 
>>fit2 <- contrasts.fit(fit, cont.matrix)
>>fit2 <- eBayes(fit2)
>>results2 <- decideTests(fit2, method="nestedF")
>>summary(results2)
> 
>    Ad.V_E2 b20.V_E2 Veh_modul E2_intrxn
> -1     729      699      1108       224
> 0    20651    20662     20662     21830
> 1      903      922       513       229
> 
> 
>>results2 <- decideTests(fit2, method="global")
>>summary(results2)
> 
>    Ad.V_E2 b20.V_E2 Veh_modul E2_intrxn
> -1     501      460       641       101
> 0    21174    21166     21311     22064
> 1      608      657       331       118
> 
>>write.csv(topTable(fit2, coef="Ad.V_E2", n=22283), "Ad+E2.csv",
> 
> row.names=FALSE)
> ***If I do this, won't I end up with the unadjusted p-values?***

No, you will have "BH" adjusted p-values. You can see what the defaults 
for a given function are by looking at the man page. For example 
?topTable will tell you:

adjust.method: method used to adjust the p-values for multiple testing.
            Options, in increasing conservatism, include '"none"',
           '"BH"', '"BY"' and '"holm"'. See 'p.adjust' for the complete
           list of options. A 'NULL' value will result in the default
           adjustment method, which is '"BH"'.
> 
> I also see other examples like this....how can I export genes in
> "selected"?
> 
>>selected<- p.adjust(fit2$F.p.value, method="fdr") < 0.05

'selected' is a logical vector with TRUE for any F-statistic with an 
adjusted p-value less than 0.05 and FALSE otherwise. You can use this 
vector to subset other parts of your MarrayLM object and build up a 
data.frame that you could then output.

For instance fit2$t[selected,] will give you a matrix containing all the 
t-statistics for the set of significant genes. You can get the 
coefficients using fit2$coefficient[selected,], etc.

You might want to read 'An Introduction to R', which can be accessed by 
typing help.start() and looking at the top left of the page that pops 
up. This should help explain some of the data structures that are used 
in R, as well as ways to manipulate them.

HTH,

Jim

> 
> 
> Basically, I'm stuck.
> Any help would be greatly appreciated. I really did read the limma user
> guide, but if anyone could point sections which would help me, I'd very
> much appreciate that as well.
> 
> Thank you,
> Edmund Chang
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623