[BioC] First time use of limma for two-color agilent array analysis

Thu Feb 27 17:29:25 CET 2014

Hi Bob,

On 2/26/2014 10:50 PM, Calin-Jageman, Robert wrote:
> I've been working on a two-color analysis of a custom-printed agilent array.  I'm examining gene expression in CNS samples from Aplysia californica that have undergone unilateral sensitization training.  I've got 8 animals, each with a sample from the trained and untrained side of the CNS.  Each pair is hybridized to one two-color array.
>
> With the limma user guide and Google, I came up with the following script which seems to give quite reasonable results:
>
> library(limma)
> targets <- readTargets("targets.txt")
> RG <- read.maimages(targets, source="agilent")
> RG <- backgroundCorrect(RG, method="normexp", offset=16)
> MA <- normalizeWithinArrays(RG, method="loess")
> MA2 <- MA[MA$genes$ControlType==0,]
> MA2.avg <- avereps(MA2, ID=MA2$genes$GeneName)
> fit <- lmFit(MA2.avg)
> fit2 <- eBayes(fit)
> allgenes <- topTable(fit2,number=nrow(fit2))
> write.table(allgenes,file="all_genes.txt")
> # I know topTable is not the only output to examine; this is just a start.
>
> Couple of quick questions for anyone out there who is reading:
> * I ended up with a much larger list of regulated transcripts than when I conducted a similar analysis in GeneSpring.  In GeneSpring I ended up with ~400 transcripts regulated at with a FDR of 0.05.  With limma, though, I get nearly 1,000 at FDR = 0.01!  Many of these, though, have fairly small fold-changes.  In fact, when I filter for at least 1.5x change, the list shrinks dramatically to about 60 transcripts.  Is the much larger list I'm initially getting due to the enhanced power of using the bayes approach for estimating sampling error, or have I failed to properly adjust for multiple comparisons in this script?

I don't know what GeneSpring uses for their univariate tests, but if 
they aren't doing something like the eBayes step, their statistics will 
be underpowered.

Note that the default for topTable is adjust = "BH", so you are 
adjusting for multiplicity. I'll leave it to others to decide if that is 
the correct way to do so. ;-D

You might also consider using treat() rather than a post hoc fold change 
adjustment. The difference being that currently you are testing against 
the null hypothesis that the log fold change == 0, and then filtering on 
fold change (so there is no statistical evidence that the |fold| is 
actually larger than zero). The treat function incorporates the fold 
change into the test, so you are directly testing that the |fold| is 
greater than some value. This is much more conservative, so you will 
likely not want a 1.5x fold change.

>
> * My qPCR data from the same samples fits the limma analysis MUCH better.  Over 15 transcripts where I've done qPCR on the same samples (mix of predicted up/down/stable), the r2 with GeneSpring estimates is about 0.5; with the limma estimates it's 0.83!  Is limma really this much better?

Like I said above, I have almost no experience with GeneSpring. But note 
that the target audience for the two platforms are quite divergent. 
GeneSpring is primarily intended for people who are willing to pay for a 
product that makes it easier for them to get their work done. Limma is 
the product of a lab that uses it for their own research, and while I am 
sure ease of use comes into play, the overarching goal is to produce 
software that improves power in a statistically rigorous way.

>
> * I'm not sure if I fully understand the choice of offset in background correction.  I've seen no offset, offsets of 16, 30.... Despite reading the documentation on the function, it's not clear to me how users select an offset or if it's actually worth using.  Any feedback?

I went to the mailing list page 
http://www.bioconductor.org/help/mailing-list/

and then searched using

backgroundCorrect offset smyth

and the first hit I got was

https://stat.ethz.ch/pipermail/bioconductor/2011-August/040885.html

Best,

Jim

>
> Cheers,
>
> Bob
>
>
>
> ========
> Robert Calin-Jageman
> Associate Professor, Psychology
> Neuroscience Program Director
> Dominican University
> rcalinjageman at dom.edu
> 708.524.6581
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099