[BioC] limma all adj.P.Val the same
Sam McInturf
smcinturf at gmail.com
Wed May 23 16:01:18 CEST 2012
On Mon, May 21, 2012 at 12:22 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Sam,
>
>
> On 5/21/2012 12:12 PM, Sam McInturf wrote:
>>
>> Hello Everyone,
>> I am working on expression profiling of wheat, where five tissue
>> types were analyzed at anthesis (flowering) and 14 days after anthesis
>> (DAA14). Because of the amount of development that occurs during the
>> first two weeks of grain filling, we expect to see differentially
>> expressed genes (Code below). I approached this as a 5x2 factorial
>> problem and took my lead from the limma guide (ch 8.7). When calling
>> topTable for any contrast, the list of P.Values contains a range of
>> values (small small, some large). While the adj.P.Val column contains
>> the same (large) value for every gene reported ( up to n = 10,000).
>>
>> The adj.P.Vals that comeback for each contrasts range from 0.87 to
>> 0.9999..
>>
>> Note the "ABSURD" contrast, when comparing different tissue types, we
>> expect a lot of DE genes, but still no DE reported
>>
>> I subsetted my data to only analyze one tissue across the two times,
>> and then doing the analysis as below, and if it is approached it as a
>> 'comparison of two groups', and still was returned adjusted p values
>> that were all the same, high value.
>>
>> Any idea why my q-values are all the same?
>
>
> Without any more information it is difficult to say. The stock answer would
> be that you don't have enough power to detect any differences. I have seen
> this on occasion, especially when there isn't much replication.
>
> You could help us by giving your targets file and an example of one of your
> topTables.
>
>
>
>
>>
>> ################################################
>> library(limma)
>> library(affy)
>> #read in our data
>> target<- readTargets("--------/TargetCEL.txt")
>> data<- ReadAffy(filenames = target$FileName)
>>
>> eset<- rma(data)
>> #Building our model. Let us model this as a 5x2 factorial problem
>> (five tissues, two times)
>> factors<- paste(target$Timing, target$Sample, sep = ".")
>> factors<- factor(factors, levels = c("ANTH.SPIKE", "ANTH.PEDUNCLE",
>> "ANTH.STEM", "ANTH.FLAGLEAF", "ANTH.NODES", "DAA14.SPIKE",
>> "DAA14.PEDUNCLE", "DAA14.STEM", "DAA14.FLAGLEAF", "DAA14.NODES"))
>> design<- model.matrix(~0+factors)
>> colnames(design)<- levels(factors)
>> contrastMat.TissueWise<- makeContrasts(
>> Spike = DAA14.SPIKE - ANTH.SPIKE,
>> PEDUNCLE = DAA14.PEDUNCLE - ANTH.PEDUNCLE,
>> STEM = DAA14.STEM - ANTH.STEM,
>> FLAGLEAF = DAA14.FLAGLEAF - ANTH.FLAGLEAF,
>> NODES = DAA14.NODES - ANTH.NODES,
>> ABSURD = ANTH.STEM - DAA14.SPIKE,
>> levels = design
>> )
>>
>> fit3<- lmFit(eset, design)
>> fit2<- contrasts.fit(fit3, contrastMat.TissueWise)
>> fit<- eBayes(fit2)
>>
>> topTable(fit2, coef = ___, n = 100)
>
>
> I am assuming this is a typo (fit2 rather than fit)?
>
> Best,
>
> Jim
>
>
>> --
>> Sam McInturf
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
Sam McInturf
Without any more information it is difficult to say. The stock answer
would be that you don't have enough power to detect any differences. I
have seen this on occasion, especially when there isn't much
replication.
You could help us by giving your targets file and an example of one of
your topTables.
I am assuming this is a typo (fit2 rather than fit)?
Best,
Jim
============================================================
Jim,
here is my target file and three topTables.
> target
FileName
Timing Sample Rep
1 1 Wheat Genome 4-26-12_(wheat).CEL ANTH SPIKE A
2 10 Wheat Genome Array 5-3-12_(wheat).CEL ANTH PEDUNCLE A
3 11 Wheat Genome Array 5-3-12_(wheat).CEL ANTH STEM A
4 12 Wheat Genome Array 5-3-12_(wheat).CEL ANTH FLAGLEAF A
5 13 Wheat Genome Array 5-3-12_(wheat).CEL ANTH NODES A
6 14 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 SPIKE A
7 15 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 PEDUNCLE A
8 16 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 STEM A
9 17 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 FLAGLEAF A
10 18 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 NODES A
11 19 Wheat Genome Array 5-8-12_(wheat).CEL ANTH SPIKE B
12 2 Wheat Genome 4-26-12_(wheat).CEL ANTH PEDUNCLE B
13 20 Wheat Genome Array 5-8-12_(wheat).CEL ANTH STEM B
14 21 Wheat Genome Array 5-8-12_(wheat).CEL ANTH FLAGLEAF B
15 22 Wheat Genome Array 5-8-12_(wheat).CEL ANTH NODES B
16 23 Wheat Genome Array 5-8-12_(wheat).CEL DAA14 SPIKE B
17 24 Wheat Genome Array 5-8-12_(wheat).CEL DAA14 PEDUNCLE B
18 25 Wheat Genome Array 5-8-12_(wheat).CEL DAA14 STEM B
19 26 Wheat Genome Array 5-8-12_(wheat).CEL DAA14 FLAGLEAF B
20 27 Wheat Genome Array 5-8-12_(wheat).CEL DAA14 NODES B
21 28 Wheat Genome Array 5-8-12_(wheat).CEL ANTH SPIKE C
22 29 Wheat Genome Array 5-10-12_(wheat).CEL ANTH PEDUNCLE C
23 3 Wheat Genome 4-26-12_(wheat).CEL ANTH STEM C
24 30 Wheat Genome Array 5-8-12_(wheat).CEL ANTH FLAGLEAF C
25 4 Wheat Genome 4-26-12_(wheat).CEL ANTH NODES C
26 5 Wheat Genome 4-26-12_(wheat).CEL DAA14 SPIKE C
27 6 Wheat Genome 4-26-12_(wheat).CEL DAA14 PEDUNCLE C
28 7 Wheat Genome 4-26-12_(wheat).CEL DAA14 STEM C
29 8 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 FLAGLEAF C
30 9 Wheat Genome Array 5-3-12_(wheat).CEL DAA14 NODES C
>topTable(fit, coef = "Spike")
ID logFC AveExpr
t P.Value adj.P.Val B
54657 TaAffx.78175.1.S1_at 0.4215108 3.216996 4.514525
0.0001717272 0.9999869 -3.269251
40437 TaAffx.13113.3.S1_at 0.4911042 3.912458 4.156361
0.0004119435 0.9999869 -3.402249
55069 TaAffx.79041.1.S1_at 0.3726910 2.666774 4.063081
0.0005172708 0.9999869 -3.438121
39227 TaAffx.12387.2.S1_at -0.7312518 3.934821 -3.968346
0.0006516554 0.9999869 -3.475043
33239 TaAffx.107821.1.S1_at 0.3374931 3.718697 3.957133
0.0006696985 0.9999869 -3.479445
53443 TaAffx.66663.2.S1_at 0.5686773 4.155875 3.925068
0.0007240767 0.9999869 -3.492069
43703 TaAffx.28502.1.S1_at 0.3998569 4.607502 3.915422
0.0007412758 0.9999869 -3.495877
36163 TaAffx.112860.1.S1_at 0.3813519 3.401501 3.899640
0.0007702914 0.9999869 -3.502118
35389 TaAffx.111828.1.S1_at 0.5061669 4.090638 3.799882
0.0009815673 0.9999869 -3.541854
33890 TaAffx.108878.1.S1_x_at 0.3026730 3.772136 3.779824
0.0010305082 0.9999869 -3.549902
> topTable(fit, coef = "ABSURD")
ID logFC AveExpr
t P.Value adj.P.Val B
32597 TaAffx.106447.1.S1_at -0.7516767 3.714621 -4.451899 0.0002001067
0.9979636 -3.291944
1598 Ta.10910.1.A1_at -0.4429634 3.524978 -4.417483 0.0002176579
0.9979636 -3.304519
15208 Ta.25816.1.S1_at -0.4721049 6.752687 -4.246018 0.0003309222
0.9979636 -3.368238
57638 TaAffx.8440.1.S1_at 0.4725505 3.198740 4.106344 0.0004654457
0.9979636 -3.421423
8110 Ta.1735.2.S1_a_at 0.5125682 3.656730 4.019142 0.0005757797
0.9979636 -3.455186
50758 TaAffx.57221.1.S1_at -0.4975976 3.789495 -4.005497 0.0005952539
0.9979636 -3.460506
58501 TaAffx.85842.1.S1_at -0.4646994 3.862172 -4.004599 0.0005965581
0.9979636 -3.460857
30327 Ta.9283.2.S1_at -0.3770003 3.214838 -3.980915 0.0006320034
0.9979636 -3.470117
9090 Ta.18507.1.S1_s_at -0.4499434 2.592315 -3.962537 0.0006609422
0.9979636 -3.477323
44052 TaAffx.29281.1.S1_at -0.5103668 3.360355 -3.933707 0.0007090089
0.9979636 -3.488663
> topTable(fit, coef = "ABSURD", p.value = 0.05)
data frame with 0 columns and 0 rows
Best!
Samuel
=====================================================
Ugh. Stupid line wraps.
So, three things. First, with only three replications you won't be
able to reliably detect very small changes (and the fold changes you
are seeing here are very small, barely even 1.5-fold on the natural
scale).
Second, the unadjusted p-values are pretty big, so it isn't surprising
that you have such large adjusted p-values.
Third, the average expression levels you are showing are really small.
I have no experience with the Wheat array, but for the mammalian
versions of the 3'-biased arrays I am used to seeing much larger
expression values. This may be expected, but if I were you I would be
doing some exploratory data analysis (image plots, density plots, etc,
maybe use arrayQualityMetrics package).
You could also do a PCA plot to see if your samples are grouping as
you expect. I often fit array weights in limma and append them to the
PCA plot to see if weighting might help (see plotPCA, c.f. addtext
argument in affycoretools).
Best,
Jim
+============================================================
James,
new user, how do I keep these on the list serve? Should I just
copy paste the conversation and send it to bioconductor at r-project.org?
Thank you for your suggestion of using PCA!
The embarrassing mistake is that my design matrix did not accurately
reflect the treatments for each chip, because it was sorted using
linux order (1, 10, 11... 2, 20, 21).
I was able to catch this mistake because of the PCA plots (
plotPCA(plot3d = TRUE, pcs = 1:3) really showed the pattern.
Thank you for your help!
More information about the Bioconductor
mailing list