[BioC] limma: instances of highly variable paired ratios, but very small p-values
Gordon K Smyth
smyth at wehi.EDU.AU
Mon Nov 19 01:06:05 CET 2012
Dear Ryan,
Thanks, this is much clearer now.
There isn't any contradiction in the results you give, and the results are
essentially as I would want them to be.
The aim of the moderated t-test is to test whether the log-ratios are
consistently greater or lesser than zero, not to test whether the
log-ratios are equal. For each of the three genes you give, the three
pairwise log-ratios are consistently greater than or less than zero. For
each gene, the three log-ratios are of the same sign, some quite large and
none very small. Hence we would conclude that the treatment (sleep
duration) does have a worthwhile effect for these genes.
Note that the moderated t-test does not penalize a gene for having a large
variance as much as the ordinary t-test would do. Compared to the
ordinary t-test, the moderated t-test gives more weight to the treatment
fold change. It can therefore be viewed as a compromise between the
ordinary t-test and ranking by fold-change.
To see whether the compromise is being drawn, look at fit$df.prior. If
this equal to 3 (the residual df for you experiment), then the moderated
t-test is giving equal weight to the genewise variance and to the global
variance. If df.prior is much larger than 3, then the moderated t-test is
using mostly the global variance. The larger the df.prior, the closer the
moderated t-test will come to ranking genes by fold change. The smaller
the df.prior, the closer the moderated t-test will come to ranking by
ordinary t-test. Larger genewise variances are penalized more when
df.prior is low and less when df.prior is large.
I don't see a strong reason to suspect a problem in your analysis, but I
should say that I do not necessarily recommend filtering by variance in
conjunction with the eBayes procedure. There is a risk that, by removing
genes with small overall variances, you are changing the natural
properties of the data, leading limma to think that the variances are more
consistent than they actually are, and leading limma to set df.prior
larger than would be ideal. You will almost certainly find that, if you
remove the filter-by-variance step, then limma will choose a lower value
for df.prior than at present.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Sun, 18 Nov 2012, Ryan Basom wrote:
> Dear Gordon,
>
> I apologize about the lack of clarity in my initial post. In my
> experiment, I'm working with three pairs of twins that differ in sleep
> duration. In my targets file, the twins are paired by "Source," and are
> grouped by sleep duration - either "short" or "long." I want try to
> identify significantly differentially expressed probes between the "short"
> and "long" groups.
>
>> targets
> Sample Group Source
> 1 short.A short A
> 2 short.B short B
> 3 short.C short C
> 4 long.A long A
> 5 long.B long B
> 6 long.C long C
>
>
> "d" is a data.frame that contains quantile normalized, log2 scaled signal
> intensity values, with samples represented in columns. The data set has
> undergone signal intensity and variance filtering steps, resulting in
> 13,597 rows of values that were used with limma.
>
>> head(d)
> long.A long.B long.C short.A short.B short.C
> 1690577 9.717804 10.068060 9.969054 11.74844 10.84940 12.46522
> 2120040 9.234160 9.415976 9.463955 10.95052 10.42986 11.84757
> 3180438 9.724018 9.370251 9.758478 11.32368 10.51315 11.97926
> 3130324 8.980456 9.163279 9.203548 10.57555 10.13772 11.71771
> 6380255 9.624958 9.118592 9.364389 11.02301 10.20351 11.85628
> 6560164 10.097376 9.852433 10.089217 11.55226 10.78285 12.20962
>
>
> design<-model.matrix(~targets$Source+targets$Group)
> fit <- lmFit(d, design)
> fit <- eBayes(fit)
> results<-topTable(fit, coef="targets$Groupshort", number=nrow(d))
>
> What I'm puzzled about are instances of probes with very small p-values,
> when the sample paired ratios vary quite a bit. Below are a few examples.
>
>> exampleProbes<-c(1450390,1230376,3420451)
>> results[results$ID %in% exampleProbes,]
> ID logFC AveExpr t P.Value adj.P.Val B
> 13 1450390 -1.373309 10.210502 -9.807005 4.353149e-17 4.553059e-14 28.26206
> 14 1230376 1.439599 10.006936 9.785405 4.905855e-17 4.764637e-14 28.14655
> 18 3420451 -1.258524 8.028813 -8.903744 6.262297e-15 4.730470e-12 23.45921
>
>
>> signalIntensities<-d[row.names(d) %in% exampleProbes,]
>> signalIntensities
> long.A long.B long.C short.A short.B short.C
> 1450390 9.561842 9.508576 9.501124 10.839129 10.179997 11.672343
> 1230376 9.108591 10.499186 12.572430 8.752280 8.271599 10.837532
> 3420451 7.534823 7.447136 7.216693 8.519415 8.080696 9.374113
>
>> ratios<-signalIntensities[1:3]-signalIntensities[4:6]
>> ratios
> long.A long.B long.C
> 1450390 -1.2772877 -0.6714210 -2.171219
> 1230376 0.3563107 2.2275875 1.734898
> 3420451 -0.9845919 -0.6335606 -2.157420
>
>
> Thank you very much for any assistance with this.
>
> Best,
>
> Ryan
>
>
>
>
> On Fri, Nov 16, 2012 at 10:36 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Ryan,
>>
>> The limma algorithm is very well understood by now, and there are many
>> bioinformatications at the FHCRC who could probably answer your question.
>>
>> I find it hard to make a response to your email because I just see a
>> jumble of numbers. I don't have a firm understanding of what your
>> experimental design is, what model you've fitted, what the numbers are that
>> you've calculated, or what you want me to see. To comment more, I'd need
>> to see your experimental design and the complete pipeline of commands used
>> to generate the output given.
>>
>> Best wishes
>> Gordon
>>
>> I don't a firm idea of what your experimental design
>>
>> Date: Wed, 14 Nov 2012 17:23:43 -0800
>>> From: Ryan Basom <rbasom at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] limma: instances of highly variable paired ratios, but
>>> very small p-values
>>>
>>> Hi,
>>>
>>> I've performed a limma analysis on paired samples that were run on
>>> Illumina HT12 arrays, with three replicates in each condition. I'm a bit
>>> troubled by the results though, as there are several probes that have very
>>> small adjusted p-values, though when looking at the paired ratio values,
>>> they vary quite a bit. Here are a few examples where the comparison is
>>> long-short, and the samples are paired by the letters A,B,C. After the
>>> adj.P.Val column, I've calculated the paired sample ratio values, these
>>> three columns are followed by the signal intensities from each sample:
>>>
>>> ProbeID TargetID logFC AveExpr P.Value adj.P.Val long-short.A
>>> long-short.B long-short.C long.A long.B long.C short.A short.B short.C
>>> 1450390 RPL17 -1.3733092649 10.2105020267 4.35314891863083e-17
>>> 4.55305891127872e-14 -1.277287712 -0.6714209686 -2.1712191142 9.5618416199
>>> 9.5085763086 9.5011242541 10.8391293319 10.1799972771 11.6723433683 1230376
>>> ALAS2 1.4395987013 10.0069363572 4.9058551374517e-17 4.76463659313792e-14
>>> 0.356310701 2.2275874983 1.7348979044 9.1085909827 10.4991863428
>>> 12.5724297981 8.7522802817 8.2715988445 10.8375318936 3420451 RSL24D1
>>> -1.2585240828 8.0288125742 6.26229691539969e-15 4.73046950881609e-12
>>> -0.9845918613 -0.6335605827 -2.1574198045 7.5348228063 7.4471358372
>>> 7.2166929547 8.5194146676 8.0806964199 9.3741127592
>>>
>>>
>>> I'd assumed that limma would have been more sensitive to this and am
>>> wondering if anyone could please explain why this may be occurring.
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>> __
>>> Ryan Basom
>>> Systems Analyst/Programmer
>>> Genomics Resource
>>> Fred Hutchinson Cancer Research Center
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list