[BioC] help in extracting limma results
James W. MacDonald
jmacdon at med.umich.edu
Wed Jan 25 16:16:38 CET 2006
echang4 at life.uiuc.edu wrote:
> Hi Bioconductor users,
> I am having trouble understanding how multiple-testing adjustment is done
> in limma (specifically the decideTest). I am really confused about the
> meaning between moderated F p-value and the adjusted p-value.
>
> If I try two different "flavors" of decideTest (e.g. nestedF and global),
> I can see that the results are different.... but is it the p-value that is
> adjusted or the F-statistic is adjusted?
The statistics themselves are never adjusted in decideTests(), only the
p-values.
The difference is in how the p-values are adjusted. For the 'global'
option, all the contrasts are considered to be independent, and the
p-values are adjusted as if you just had a bunch of independent t-tests.
The nestedF option is a bit more complicated. First, a bit of
background. The F-statistic is used to determine if there are any
differences between the samples, but it doesn't tell you which sample(s)
are different. You have to fit contrasts to find out which sample(s) are
different.
So the idea with the nestedF is to adjust the p-values associated with
the F-test to find which genes are differentially expressed in at least
one sample. Now we have a list of genes that are differentially
expressed, but we don't know for which sample(s) that may be true. The
t-statistics associated with the contrasts are then inspected and the
largest one (in absolute value) is considered significant. Now, there
may be other contrasts that are significant as well, so the largest
t-statistic is set to the same absolute value as the second largest
t-statistic, and the F-statistic is calculated again. If the F-statistic
is still significant, the second largest contrast is considered
significant. This procedure is continued until the F-statistic is no
longer significant.
The basic reasoning here is that the largest t-statistic for a set of
contrasts is significant if the overall F-statistic is significant. By
following this step-wise procedure, we can determine which contrasts are
contributing to the overall significance of the F-statistic.
> And how do I extract the gene lists (cutoff of FDR-adjusted p-value< 0.05)?
There are three ways that I know of. First, you can use write.fit(),
which is part of limma. Something like:
write.fit(fit2, results2, "myoutputdata.txt", adjust="BH")
will output all your genes in a text file that you can then open up in
e.g., Excel and filter based on the different columns. Note that the
p-values associated with the contrasts will be adjusted, but the
p-values associated with the F-statistics will not, so this is
equivalent to decideTests() with method = "separate".
If you are using Affy chips, you could use limma2annaffy() in
affycoretools. This will output either HTML or text tables (or both)
containing the expression values, statistics, p-values and links to
online databases. I don't know your platform, so I won't go into any
more detail other than to say that affycoretools is in the devel
repository, so you will either need R-2.3.0 if you want to use
biocLite() to get it, or will have to download and install by hand if
you have an earlier version of R.
A third option would be to extract the information you want and output
it by 'hand'. This would involve a bit of work on your part. To fully
explain what you need to do would take too long, so I will just give
some pointers below.
>
> I basically followed the limma user guide (which is superbly written,
> thank you!) But after completing the analysis, I am stuck trying to
> extract the gene lists out of R and into a csv file.
>
> Can someone please help me out with the code which would extract the genes
> under "results2"? My code is as follows:
>
>
>>cont.matrix <- makeContrasts(
>
> Ad.V_E2 = Ad.E2-Ad.veh,
> b20.V_E2 = b20.E2-b20.veh,
> Veh_modul = b20.veh-Ad.veh,
> E2_intrxn = (b20.E2-b20.veh)-(Ad.E2-Ad.veh), levels=design)
>
>>fit2 <- contrasts.fit(fit, cont.matrix)
>>fit2 <- eBayes(fit2)
>>results2 <- decideTests(fit2, method="nestedF")
>>summary(results2)
>
> Ad.V_E2 b20.V_E2 Veh_modul E2_intrxn
> -1 729 699 1108 224
> 0 20651 20662 20662 21830
> 1 903 922 513 229
>
>
>>results2 <- decideTests(fit2, method="global")
>>summary(results2)
>
> Ad.V_E2 b20.V_E2 Veh_modul E2_intrxn
> -1 501 460 641 101
> 0 21174 21166 21311 22064
> 1 608 657 331 118
>
>>write.csv(topTable(fit2, coef="Ad.V_E2", n=22283), "Ad+E2.csv",
>
> row.names=FALSE)
> ***If I do this, won't I end up with the unadjusted p-values?***
No, you will have "BH" adjusted p-values. You can see what the defaults
for a given function are by looking at the man page. For example
?topTable will tell you:
adjust.method: method used to adjust the p-values for multiple testing.
Options, in increasing conservatism, include '"none"',
'"BH"', '"BY"' and '"holm"'. See 'p.adjust' for the complete
list of options. A 'NULL' value will result in the default
adjustment method, which is '"BH"'.
>
> I also see other examples like this....how can I export genes in
> "selected"?
>
>>selected<- p.adjust(fit2$F.p.value, method="fdr") < 0.05
'selected' is a logical vector with TRUE for any F-statistic with an
adjusted p-value less than 0.05 and FALSE otherwise. You can use this
vector to subset other parts of your MarrayLM object and build up a
data.frame that you could then output.
For instance fit2$t[selected,] will give you a matrix containing all the
t-statistics for the set of significant genes. You can get the
coefficients using fit2$coefficient[selected,], etc.
You might want to read 'An Introduction to R', which can be accessed by
typing help.start() and looking at the top left of the page that pops
up. This should help explain some of the data structures that are used
in R, as well as ways to manipulate them.
HTH,
Jim
>
>
> Basically, I'm stuck.
> Any help would be greatly appreciated. I really did read the limma user
> guide, but if anyone could point sections which would help me, I'd very
> much appreciate that as well.
>
> Thank you,
> Edmund Chang
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
--
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list