[BioC] limma, voom, weights: factorial versus nested interaction results

Wed Jan 9 08:59:51 CET 2013

Dear Gordon,

Thanks again. That's very helpful. I am working w/ iCLIP data. In terms
of size, probably similar to SAGE: only 10^5 to 10^6 tags/reads for a
typical human sample. Have applied both edgeR and voom, and the results
do not seem to be widely different; though not what I expected, but
that's another matter.

Best, Reiner

On 06/01/13 05:08, Gordon K Smyth wrote:
> Dear Reiner,
> 
> Well, I don't honestly know yet which of edgeR or voom are better in
> what circumstances.  I feel that I can't know this until I have pushed
> both methods as far as they can go, something that I am still doing, and
> then have tried them both out on a wide variety of datasets.
> 
> I started the edgeR project way back in 2004, before RNA-Seq technology
> even existed.  My thought then was that non-normal methods would be
> essential.  And I think that it remains true that edgeR has a clear edge
> over other methods for the SAGE data that was available then.
> 
> edgeR performs very well, but has not solved all our data analysis
> problems for RNA-Seq data, not yet anyway.
> 
> voom has surprised me by performing much better than I expected.  It
> solves the biggest problem with normal-based methods by estimating the
> mean-variance relationship adaptively.  It holds its size correctly in
> almost any circumstance, scales to very large datasets, can adaptively
> estimate the amount of smoothing necessary, and gives immediate access
> to all the downstream limma pipeline including gene set testing.
> 
> My feeling the moment is that edgeR is superior for small counts and but
> that voom is safer and more reliable for noisy heterogeneous data.  Only
> edgeR can estimate the biological coefficient of variation (as we
> defined this in our 2012 paper).  But we are actively working on both
> methods, and are open to what we find.
> 
> Best wishes
> Gordon
> 
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
> 
> On Fri, 4 Jan 2013, Reiner Schulz wrote:
> 
>> Thank you very much, Gordon.
>>
>> Out of curiosity, why did you develop the voom method when there is
>> edgeR? Are there good, perhaps data-dependent, reasons for using one
>> over the other?
>>
>> Best, Reiner
>>
>> On 03/01/13 23:59, Gordon K Smyth wrote:
>>> Dear Reiner,
>>>
>>> You don't mention contrasts.fit(), but I assume that you are using that
>>> function to get equivalent quantities from the two design matrices.
>>>
>>> If I am guessing your problem correctly, then the issue is that
>>> contrasts.fit() gives results that are not exact in the presence of voom
>>> weights.  See the warning note on the help page for contrasts.fit.
>>>
>>> You will get the exact correct t-statistic if the contrast you want is a
>>> coefficient of your design matrix.  You should also get exact correct
>>> results when using the one way layout design formula ~0+cond, but you
>>> may get an approximate test statistic if you use a contrast from the
>>> nested formula.  The fold changes are exact in all cases.
>>>
>>> There was a thread about this a few years ago, see:
>>>
>>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030947.html
>>>
>>>
>>>
>>> Basically I have promised to change contrasts.fit() to give exact
>>> results in all cases, even though this will make it much slower in some
>>> cases.  I have not so far found the time to do it, but will definitely
>>> do so by the time we publish the voom method.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>>
>>>
>>>> Date: Wed, 2 Jan 2013 13:31:58 +0100
>>>> From: Reiner Schulz <reiner.schulz at kcl.ac.uk>
>>>> To: <bioconductor at r-project.org>
>>>> Subject: [BioC] limma, voom,    weights: factorial versus nested
>>>>     interaction results
>>>>
>>>> Dear all,
>>>>
>>>> Am analysing an RNA-seq experiment with a two-way factorial design, and
>>>> am observing different results for two contrasts of interest depending
>>>> on whether I use a factorial model matrix or a nested interaction model
>>>> matrix. I only observe this if the data are pre-processed w/ voom:
>>>>> counts.voom= voom( ...
>>>> i.e., when there are observational level weights passed to lmFit. If I
>>>> only pass counts.voom$E (no observational level weights) to lmFit, the
>>>> results for the two contrasts of interest are identical for the
>>>> factorial and nested interaction model matrices.
>>>> Is this to be expected, and if so, what is the explanation?
>>>>
>>>> Much appreciated, Reiner
>>>>
>>>> PS: some more details ...
>>>>
>>>>> strain
>>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT  WT  WT  WT
>>>> WT  WT
>>>> [20] WT  WT  WT  WT  WT  WT  WT
>>>> Levels: WT MUT
>>>>> treat
>>>> [1] U  T2 T1 U  T2 T1 U  U  T2 T1 U  T2 T1 U  T2 T1 U  T2 T1 U  U  T2
>>>> T1 U  T2
>>>> [26] T1
>>>> Levels: U T1 T2
>>>>
>>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U',
>>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2'))
>>>>> design.matrix= model.matrix( ~0 + cond)
>>>>> design.matrix
>>>>   WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2
>>>> 1  0    1     0     0      0      0
>>>> 2  0    0     0     0      0      1
>>>> 3  0    0     0     1      0      0
>>>> 4  0    1     0     0      0      0
>>>> 5  0    0     0     0      0      1
>>>> 6  0    0     0     1      0      0
>>>> 7  0    1     0     0      0      0
>>>> 8  0    1     0     0      0      0
>>>> 9  0    0     0     0      0      1
>>>> 10 0    0     0     1      0      0
>>>> 11 0    1     0     0      0      0
>>>> 12 0    0     0     0      0      1
>>>> 13 0    0     0     1      0      0
>>>> 14 1    0     0     0      0      0
>>>> 15 0    0     0     0      1      0
>>>> 16 0    0     1     0      0      0
>>>> 17 1    0     0     0      0      0
>>>> 18 0    0     0     0      1      0
>>>> 19 0    0     1     0      0      0
>>>> 20 1    0     0     0      0      0
>>>> 21 1    0     0     0      0      0
>>>> 22 0    0     0     0      1      0
>>>> 23 0    0     1     0      0      0
>>>> 24 1    0     0     0      0      0
>>>> 25 0    0     0     0      1      0
>>>> 26 0    0     1     0      0      0
>>>>
>>>>> design.matrix.alt= model.matrix( ~strain + strain:treat)
>>>>> design.matrix.alt
>>>>   Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2
>>>> 1  1         1   0     0      0     0
>>>> 2  1         1   0     0      0     1
>>>> 3  1         1   0     1      0     0
>>>> 4  1         1   0     0      0     0
>>>> 5  1         1   0     0      0     1
>>>> 6  1         1   0     1      0     0
>>>> 7  1         1   0     0      0     0
>>>> 8  1         1   0     0      0     0
>>>> 9  1         1   0     0      0     1
>>>> 10 1         1   0     1      0     0
>>>> 11 1         1   0     0      0     0
>>>> 12 1         1   0     0      0     1
>>>> 13 1         1   0     1      0     0
>>>> 14 1         0   0     0      0     0
>>>> 15 1         0   0     0      1     0
>>>> 16 1         0   1     0      0     0
>>>> 17 1         0   0     0      0     0
>>>> 18 1         0   0     0      1     0
>>>> 19 1         0   1     0      0     0
>>>> 20 1         0   0     0      0     0
>>>> 21 1         0   0     0      0     0
>>>> 22 1         0   0     0      1     0
>>>> 23 1         0   1     0      0     0
>>>> 24 1         0   0     0      0     0
>>>> 25 1         0   0     0      1     0
>>>> 26 1         0   1     0      0     0
>>>>
>>>> Observed equalities (as expected):
>>>> MUT == MUT.U - WT.U
>>>> WT:T1 == WT.T1 - WT.U
>>>> MUT:T1 == MUT.T1 - MUT.U
>>>> WT:T2 == WT.T2 - WT.U
>>>> MUT:T2 == MUT.T2 - MUT.U
>>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U)
>>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U)
>>>>
>>>> Observed inequalities (only w/ voom pre-processed data; unexpected):
>>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U)
>>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U)
>>>>
>>>> -- 
>>>>
>>>
>>
>> -- 
>> Reiner Schulz, Ph.D.
>> Senior Lecturer in Bioinformatics and Epigenomics
>> Dept. of Medical and Molecular Genetics
>> King's College London
>> Guy's Hospital, 8th floor Tower Wing
>> London SE1 9RT
>> https://atlas.genetics.kcl.ac.uk/~rschulz
>>
>>
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:21}}