[BioC] First time use of limma for two-color agilent array analysis
James W. MacDonald
jmacdon at uw.edu
Thu Feb 27 17:29:25 CET 2014
Hi Bob,
On 2/26/2014 10:50 PM, Calin-Jageman, Robert wrote:
> I've been working on a two-color analysis of a custom-printed agilent array. I'm examining gene expression in CNS samples from Aplysia californica that have undergone unilateral sensitization training. I've got 8 animals, each with a sample from the trained and untrained side of the CNS. Each pair is hybridized to one two-color array.
>
> With the limma user guide and Google, I came up with the following script which seems to give quite reasonable results:
>
> library(limma)
> targets <- readTargets("targets.txt")
> RG <- read.maimages(targets, source="agilent")
> RG <- backgroundCorrect(RG, method="normexp", offset=16)
> MA <- normalizeWithinArrays(RG, method="loess")
> MA2 <- MA[MA$genes$ControlType==0,]
> MA2.avg <- avereps(MA2, ID=MA2$genes$GeneName)
> fit <- lmFit(MA2.avg)
> fit2 <- eBayes(fit)
> allgenes <- topTable(fit2,number=nrow(fit2))
> write.table(allgenes,file="all_genes.txt")
> # I know topTable is not the only output to examine; this is just a start.
>
> Couple of quick questions for anyone out there who is reading:
> * I ended up with a much larger list of regulated transcripts than when I conducted a similar analysis in GeneSpring. In GeneSpring I ended up with ~400 transcripts regulated at with a FDR of 0.05. With limma, though, I get nearly 1,000 at FDR = 0.01! Many of these, though, have fairly small fold-changes. In fact, when I filter for at least 1.5x change, the list shrinks dramatically to about 60 transcripts. Is the much larger list I'm initially getting due to the enhanced power of using the bayes approach for estimating sampling error, or have I failed to properly adjust for multiple comparisons in this script?
I don't know what GeneSpring uses for their univariate tests, but if
they aren't doing something like the eBayes step, their statistics will
be underpowered.
Note that the default for topTable is adjust = "BH", so you are
adjusting for multiplicity. I'll leave it to others to decide if that is
the correct way to do so. ;-D
You might also consider using treat() rather than a post hoc fold change
adjustment. The difference being that currently you are testing against
the null hypothesis that the log fold change == 0, and then filtering on
fold change (so there is no statistical evidence that the |fold| is
actually larger than zero). The treat function incorporates the fold
change into the test, so you are directly testing that the |fold| is
greater than some value. This is much more conservative, so you will
likely not want a 1.5x fold change.
>
> * My qPCR data from the same samples fits the limma analysis MUCH better. Over 15 transcripts where I've done qPCR on the same samples (mix of predicted up/down/stable), the r2 with GeneSpring estimates is about 0.5; with the limma estimates it's 0.83! Is limma really this much better?
Like I said above, I have almost no experience with GeneSpring. But note
that the target audience for the two platforms are quite divergent.
GeneSpring is primarily intended for people who are willing to pay for a
product that makes it easier for them to get their work done. Limma is
the product of a lab that uses it for their own research, and while I am
sure ease of use comes into play, the overarching goal is to produce
software that improves power in a statistically rigorous way.
>
> * I'm not sure if I fully understand the choice of offset in background correction. I've seen no offset, offsets of 16, 30.... Despite reading the documentation on the function, it's not clear to me how users select an offset or if it's actually worth using. Any feedback?
I went to the mailing list page
http://www.bioconductor.org/help/mailing-list/
and then searched using
backgroundCorrect offset smyth
and the first hit I got was
https://stat.ethz.ch/pipermail/bioconductor/2011-August/040885.html
Best,
Jim
>
> Cheers,
>
> Bob
>
>
>
> ========
> Robert Calin-Jageman
> Associate Professor, Psychology
> Neuroscience Program Director
> Dominican University
> rcalinjageman at dom.edu
> 708.524.6581
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list