[BioC] limma all adj.P.Val the same

Wed May 23 16:01:18 CEST 2012

On Mon, May 21, 2012 at 12:22 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Sam,
>
>
> On 5/21/2012 12:12 PM, Sam McInturf wrote:
>>
>> Hello Everyone,
>>    I am working on expression profiling of wheat, where five tissue
>> types were analyzed at anthesis (flowering) and 14 days after anthesis
>> (DAA14).  Because of the amount of development that occurs during the
>> first two weeks of grain filling, we expect to see differentially
>> expressed genes (Code below). I approached this as a 5x2 factorial
>> problem and took my lead from the limma guide (ch 8.7).  When calling
>> topTable for any contrast, the list of P.Values contains a range of
>> values (small small, some large).  While the adj.P.Val column contains
>> the same (large) value for every gene reported ( up to n = 10,000).
>>
>> The adj.P.Vals that comeback for each contrasts range from 0.87 to
>> 0.9999..
>>
>> Note the "ABSURD" contrast, when comparing different tissue types, we
>> expect a lot of DE genes, but still no DE reported
>>
>> I subsetted my data to only analyze one tissue across the two times,
>> and then doing the analysis as below, and if it is approached it as a
>> 'comparison of two groups', and still was returned adjusted p values
>> that were all the same, high value.
>>
>> Any idea why my q-values are all the same?
>
>
> Without any more information it is difficult to say. The stock answer would
> be that you don't have enough power to detect any differences. I have seen
> this on occasion, especially when there isn't much replication.
>
> You could help us by giving your targets file and an example of one of your
> topTables.
>
>
>
>
>>
>> ################################################
>> library(limma)
>> library(affy)
>> #read in our data
>> target<- readTargets("--------/TargetCEL.txt")
>> data<- ReadAffy(filenames = target$FileName)
>>
>> eset<- rma(data)
>> #Building our model.  Let us model this as a 5x2 factorial problem
>> (five tissues, two times)
>> factors<- paste(target$Timing, target$Sample, sep = ".")
>> factors<- factor(factors, levels = c("ANTH.SPIKE",  "ANTH.PEDUNCLE",
>> "ANTH.STEM", "ANTH.FLAGLEAF",  "ANTH.NODES", "DAA14.SPIKE",
>> "DAA14.PEDUNCLE", "DAA14.STEM", "DAA14.FLAGLEAF", "DAA14.NODES"))
>> design<- model.matrix(~0+factors)
>> colnames(design)<- levels(factors)
>> contrastMat.TissueWise<- makeContrasts(
>>                  Spike = DAA14.SPIKE - ANTH.SPIKE,
>>                  PEDUNCLE = DAA14.PEDUNCLE - ANTH.PEDUNCLE,
>>                  STEM = DAA14.STEM - ANTH.STEM,
>>                  FLAGLEAF = DAA14.FLAGLEAF - ANTH.FLAGLEAF,
>>                  NODES = DAA14.NODES - ANTH.NODES,
>>                  ABSURD = ANTH.STEM - DAA14.SPIKE,
>>                   levels = design
>>                  )
>>
>> fit3<- lmFit(eset, design)
>> fit2<- contrasts.fit(fit3, contrastMat.TissueWise)
>> fit<- eBayes(fit2)
>>
>> topTable(fit2, coef = ___, n = 100)
>
>
> I am assuming this is a typo (fit2 rather than fit)?
>
> Best,
>
> Jim
>
>
>> --
>> Sam McInturf
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
Sam McInturf

Without any more information it is difficult to say. The stock answer
would be that you don't have enough power to detect any differences. I
have seen this on occasion, especially when there isn't much
replication.

You could help us by giving your targets file and an example of one of
your topTables.

I am assuming this is a typo (fit2 rather than fit)?

Best,

Jim

============================================================
Jim,
here is my target file and three topTables.
> target
                                   FileName
Timing   Sample          Rep
1         1 Wheat Genome 4-26-12_(wheat).CEL      ANTH     SPIKE            A
2   10 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     PEDUNCLE   A
3   11 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     STEM           A
4   12 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     FLAGLEAF   A
5   13 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     NODES        A
6   14 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    SPIKE           A
7   15 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    PEDUNCLE   A
8   16 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    STEM            A
9   17 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    FLAGLEAF   A
10  18 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14   NODES        A
11  19 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH     SPIKE          B
12        2 Wheat Genome 4-26-12_(wheat).CEL     ANTH     PEDUNCLE   B
13  20 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH     STEM           B
14  21 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH    FLAGLEAF   B
15  22 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH    NODES        B
16  23 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   SPIKE           B
17  24 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   PEDUNCLE   B
18  25 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   STEM           B
19  26 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   FLAGLEAF   B
20  27 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   NODES        B
21  28 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH    SPIKE           C
22 29 Wheat Genome Array 5-10-12_(wheat).CEL ANTH     PEDUNCLE   C
23        3 Wheat Genome 4-26-12_(wheat).CEL     ANTH     STEM           C
24  30 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH FLAGLEAF      C
25        4 Wheat Genome 4-26-12_(wheat).CEL   ANTH    NODES            C
26        5 Wheat Genome 4-26-12_(wheat).CEL  DAA14    SPIKE            C
27        6 Wheat Genome 4-26-12_(wheat).CEL  DAA14 PEDUNCLE       C
28        7 Wheat Genome 4-26-12_(wheat).CEL  DAA14     STEM            C
29   8 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14 FLAGLEAF      C
30   9 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    NODES        C

>topTable(fit, coef = "Spike")
                          ID                 logFC         AveExpr
     t      P.Value        adj.P.Val         B
54657    TaAffx.78175.1.S1_at  0.4215108 3.216996  4.514525
0.0001717272 0.9999869 -3.269251
40437    TaAffx.13113.3.S1_at  0.4911042 3.912458  4.156361
0.0004119435 0.9999869 -3.402249
55069    TaAffx.79041.1.S1_at  0.3726910 2.666774  4.063081
0.0005172708 0.9999869 -3.438121
39227    TaAffx.12387.2.S1_at -0.7312518 3.934821 -3.968346
0.0006516554 0.9999869 -3.475043
33239   TaAffx.107821.1.S1_at  0.3374931 3.718697  3.957133
0.0006696985 0.9999869 -3.479445
53443    TaAffx.66663.2.S1_at  0.5686773 4.155875  3.925068
0.0007240767 0.9999869 -3.492069
43703    TaAffx.28502.1.S1_at  0.3998569 4.607502  3.915422
0.0007412758 0.9999869 -3.495877
36163   TaAffx.112860.1.S1_at  0.3813519 3.401501  3.899640
0.0007702914 0.9999869 -3.502118
35389   TaAffx.111828.1.S1_at  0.5061669 4.090638  3.799882
0.0009815673 0.9999869 -3.541854
33890 TaAffx.108878.1.S1_x_at  0.3026730 3.772136  3.779824
0.0010305082 0.9999869 -3.549902

> topTable(fit, coef = "ABSURD")
                        ID                logFC        AveExpr
 t          P.Value        adj.P.Val         B
32597 TaAffx.106447.1.S1_at -0.7516767 3.714621 -4.451899 0.0002001067
0.9979636 -3.291944
1598       Ta.10910.1.A1_at -0.4429634 3.524978 -4.417483 0.0002176579
0.9979636 -3.304519
15208      Ta.25816.1.S1_at -0.4721049 6.752687 -4.246018 0.0003309222
0.9979636 -3.368238
57638   TaAffx.8440.1.S1_at  0.4725505 3.198740  4.106344 0.0004654457
0.9979636 -3.421423
8110      Ta.1735.2.S1_a_at  0.5125682 3.656730  4.019142 0.0005757797
0.9979636 -3.455186
50758  TaAffx.57221.1.S1_at -0.4975976 3.789495 -4.005497 0.0005952539
0.9979636 -3.460506
58501  TaAffx.85842.1.S1_at -0.4646994 3.862172 -4.004599 0.0005965581
0.9979636 -3.460857
30327       Ta.9283.2.S1_at -0.3770003 3.214838 -3.980915 0.0006320034
0.9979636 -3.470117
9090     Ta.18507.1.S1_s_at -0.4499434 2.592315 -3.962537 0.0006609422
0.9979636 -3.477323
44052  TaAffx.29281.1.S1_at -0.5103668 3.360355 -3.933707 0.0007090089
0.9979636 -3.488663

> topTable(fit, coef = "ABSURD", p.value = 0.05)
data frame with 0 columns and 0 rows

Best!

Samuel
=====================================================

Ugh. Stupid line wraps.

So, three things. First, with only three replications you won't be
able to reliably detect very small changes (and the fold changes you
are seeing here are very small, barely even 1.5-fold on the natural
scale).

Second, the unadjusted p-values are pretty big, so it isn't surprising
that you have such large adjusted p-values.

Third, the average expression levels you are showing are really small.
I have no experience with the Wheat array, but for the mammalian
versions of the 3'-biased arrays I am used to seeing much larger
expression values. This may be expected, but if I were you I would be
doing some exploratory data analysis (image plots, density plots, etc,
maybe use arrayQualityMetrics package).

You could also do a PCA plot to see if your samples are grouping as
you expect. I often fit array weights in limma and append them to the
PCA plot to see if weighting might help (see plotPCA, c.f. addtext
argument in affycoretools).

Best,

Jim

+============================================================
James,
   new user, how do I keep these on the list serve?  Should I just
copy paste the conversation and send it to bioconductor at r-project.org?

Thank you for your suggestion of using PCA!
The embarrassing mistake is that my design matrix did not accurately
reflect the treatments for each chip, because it was sorted using
linux order (1, 10, 11... 2, 20, 21).

I was able to catch this mistake because of the PCA plots (
plotPCA(plot3d = TRUE, pcs = 1:3) really showed the pattern.

Thank you for your help!