[BioC] limma, voom, weights: factorial versus nested interaction results

Ryan C. Thompson rct at thompsonclan.org
Wed Jan 9 20:03:17 CET 2013


Hi Gordon,

Thanks for the great explanation. Could you expand on what you mean by 
"noisy heterogeneous data"? That term would seem to describe just about 
any RNA-seq data set to some degree, but I assume you have something 
specific in mind. Are you talking about data with a large BCV, or maybe 
data where different genes have widely-varying dispersions?

Also, if it's not too much trouble, could you expand on how voom changes 
the underlying assumptions of limma? Specifically, which of the 
assumptions of ANOVA are relaxed and which are not changed? It still 
confuses me that a normal-based method could work on data that are not 
expected to be normal, and it's not clear to me how estimating the 
mean-variance relationship adaptively would make up for using the 
"wrong" distribution (normal vs negative binomial).

Thanks,

-Ryan Thompson


On 01/05/2013 09:08 PM, Gordon K Smyth wrote:
> Dear Reiner,
>
> Well, I don't honestly know yet which of edgeR or voom are better in 
> what circumstances.  I feel that I can't know this until I have pushed 
> both methods as far as they can go, something that I am still doing, 
> and then have tried them both out on a wide variety of datasets.
>
> I started the edgeR project way back in 2004, before RNA-Seq 
> technology even existed.  My thought then was that non-normal methods 
> would be essential.  And I think that it remains true that edgeR has a 
> clear edge over other methods for the SAGE data that was available then.
>
> edgeR performs very well, but has not solved all our data analysis 
> problems for RNA-Seq data, not yet anyway.
>
> voom has surprised me by performing much better than I expected. It 
> solves the biggest problem with normal-based methods by estimating the 
> mean-variance relationship adaptively.  It holds its size correctly in 
> almost any circumstance, scales to very large datasets, can adaptively 
> estimate the amount of smoothing necessary, and gives immediate access 
> to all the downstream limma pipeline including gene set testing.
>
> My feeling the moment is that edgeR is superior for small counts and 
> but that voom is safer and more reliable for noisy heterogeneous 
> data.  Only edgeR can estimate the biological coefficient of variation 
> (as we defined this in our 2012 paper). But we are actively working on 
> both methods, and are open to what we find.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Fri, 4 Jan 2013, Reiner Schulz wrote:
>
>> Thank you very much, Gordon.
>>
>> Out of curiosity, why did you develop the voom method when there is
>> edgeR? Are there good, perhaps data-dependent, reasons for using one
>> over the other?
>>
>> Best, Reiner
>>
>> On 03/01/13 23:59, Gordon K Smyth wrote:
>>> Dear Reiner,
>>>
>>> You don't mention contrasts.fit(), but I assume that you are using that
>>> function to get equivalent quantities from the two design matrices.
>>>
>>> If I am guessing your problem correctly, then the issue is that
>>> contrasts.fit() gives results that are not exact in the presence of 
>>> voom
>>> weights.  See the warning note on the help page for contrasts.fit.
>>>
>>> You will get the exact correct t-statistic if the contrast you want 
>>> is a
>>> coefficient of your design matrix.  You should also get exact correct
>>> results when using the one way layout design formula ~0+cond, but you
>>> may get an approximate test statistic if you use a contrast from the
>>> nested formula.  The fold changes are exact in all cases.
>>>
>>> There was a thread about this a few years ago, see:
>>>
>>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030947.html 
>>>
>>>
>>>
>>> Basically I have promised to change contrasts.fit() to give exact
>>> results in all cases, even though this will make it much slower in some
>>> cases.  I have not so far found the time to do it, but will definitely
>>> do so by the time we publish the voom method.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>>
>>>
>>>> Date: Wed, 2 Jan 2013 13:31:58 +0100
>>>> From: Reiner Schulz <reiner.schulz at kcl.ac.uk>
>>>> To: <bioconductor at r-project.org>
>>>> Subject: [BioC] limma, voom,    weights: factorial versus nested
>>>>     interaction results
>>>>
>>>> Dear all,
>>>>
>>>> Am analysing an RNA-seq experiment with a two-way factorial design, 
>>>> and
>>>> am observing different results for two contrasts of interest depending
>>>> on whether I use a factorial model matrix or a nested interaction 
>>>> model
>>>> matrix. I only observe this if the data are pre-processed w/ voom:
>>>>> counts.voom= voom( ...
>>>> i.e., when there are observational level weights passed to lmFit. If I
>>>> only pass counts.voom$E (no observational level weights) to lmFit, the
>>>> results for the two contrasts of interest are identical for the
>>>> factorial and nested interaction model matrices.
>>>> Is this to be expected, and if so, what is the explanation?
>>>>
>>>> Much appreciated, Reiner
>>>>
>>>> PS: some more details ...
>>>>
>>>>> strain
>>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT  WT  WT
>>>> WT  WT
>>>> [20] WT  WT  WT  WT  WT  WT  WT
>>>> Levels: WT MUT
>>>>> treat
>>>> [1] U  T2 T1 U  T2 T1 U  U  T2 T1 U  T2 T1 U  T2 T1 U  T2 T1 U  U  T2
>>>> T1 U  T2
>>>> [26] T1
>>>> Levels: U T1 T2
>>>>
>>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U',
>>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2'))
>>>>> design.matrix= model.matrix( ~0 + cond)
>>>>> design.matrix
>>>>   WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2
>>>> 1  0    1     0     0      0      0
>>>> 2  0    0     0     0      0      1
>>>> 3  0    0     0     1      0      0
>>>> 4  0    1     0     0      0      0
>>>> 5  0    0     0     0      0      1
>>>> 6  0    0     0     1      0      0
>>>> 7  0    1     0     0      0      0
>>>> 8  0    1     0     0      0      0
>>>> 9  0    0     0     0      0      1
>>>> 10 0    0     0     1      0      0
>>>> 11 0    1     0     0      0      0
>>>> 12 0    0     0     0      0      1
>>>> 13 0    0     0     1      0      0
>>>> 14 1    0     0     0      0      0
>>>> 15 0    0     0     0      1      0
>>>> 16 0    0     1     0      0      0
>>>> 17 1    0     0     0      0      0
>>>> 18 0    0     0     0      1      0
>>>> 19 0    0     1     0      0      0
>>>> 20 1    0     0     0      0      0
>>>> 21 1    0     0     0      0      0
>>>> 22 0    0     0     0      1      0
>>>> 23 0    0     1     0      0      0
>>>> 24 1    0     0     0      0      0
>>>> 25 0    0     0     0      1      0
>>>> 26 0    0     1     0      0      0
>>>>
>>>>> design.matrix.alt= model.matrix( ~strain + strain:treat)
>>>>> design.matrix.alt
>>>>   Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2
>>>> 1  1         1   0     0      0     0
>>>> 2  1         1   0     0      0     1
>>>> 3  1         1   0     1      0     0
>>>> 4  1         1   0     0      0     0
>>>> 5  1         1   0     0      0     1
>>>> 6  1         1   0     1      0     0
>>>> 7  1         1   0     0      0     0
>>>> 8  1         1   0     0      0     0
>>>> 9  1         1   0     0      0     1
>>>> 10 1         1   0     1      0     0
>>>> 11 1         1   0     0      0     0
>>>> 12 1         1   0     0      0     1
>>>> 13 1         1   0     1      0     0
>>>> 14 1         0   0     0      0     0
>>>> 15 1         0   0     0      1     0
>>>> 16 1         0   1     0      0     0
>>>> 17 1         0   0     0      0     0
>>>> 18 1         0   0     0      1     0
>>>> 19 1         0   1     0      0     0
>>>> 20 1         0   0     0      0     0
>>>> 21 1         0   0     0      0     0
>>>> 22 1         0   0     0      1     0
>>>> 23 1         0   1     0      0     0
>>>> 24 1         0   0     0      0     0
>>>> 25 1         0   0     0      1     0
>>>> 26 1         0   1     0      0     0
>>>>
>>>> Observed equalities (as expected):
>>>> MUT == MUT.U - WT.U
>>>> WT:T1 == WT.T1 - WT.U
>>>> MUT:T1 == MUT.T1 - MUT.U
>>>> WT:T2 == WT.T2 - WT.U
>>>> MUT:T2 == MUT.T2 - MUT.U
>>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U)
>>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U)
>>>>
>>>> Observed inequalities (only w/ voom pre-processed data; unexpected):
>>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U)
>>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U)
>>>>
>>>> -- 
>>>>
>>>
>>
>> -- 
>> Reiner Schulz, Ph.D.
>> Senior Lecturer in Bioinformatics and Epigenomics
>> Dept. of Medical and Molecular Genetics
>> King's College London
>> Guy's Hospital, 8th floor Tower Wing
>> London SE1 9RT
>> https://atlas.genetics.kcl.ac.uk/~rschulz
>>
>>
>
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:4}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list