[BioC] limma, voom, weights: factorial versus nested interaction results
Reiner Schulz
reiner.schulz at genetics.kcl.ac.uk
Wed Jan 9 08:59:51 CET 2013
Dear Gordon,
Thanks again. That's very helpful. I am working w/ iCLIP data. In terms
of size, probably similar to SAGE: only 10^5 to 10^6 tags/reads for a
typical human sample. Have applied both edgeR and voom, and the results
do not seem to be widely different; though not what I expected, but
that's another matter.
Best, Reiner
On 06/01/13 05:08, Gordon K Smyth wrote:
> Dear Reiner,
>
> Well, I don't honestly know yet which of edgeR or voom are better in
> what circumstances. I feel that I can't know this until I have pushed
> both methods as far as they can go, something that I am still doing, and
> then have tried them both out on a wide variety of datasets.
>
> I started the edgeR project way back in 2004, before RNA-Seq technology
> even existed. My thought then was that non-normal methods would be
> essential. And I think that it remains true that edgeR has a clear edge
> over other methods for the SAGE data that was available then.
>
> edgeR performs very well, but has not solved all our data analysis
> problems for RNA-Seq data, not yet anyway.
>
> voom has surprised me by performing much better than I expected. It
> solves the biggest problem with normal-based methods by estimating the
> mean-variance relationship adaptively. It holds its size correctly in
> almost any circumstance, scales to very large datasets, can adaptively
> estimate the amount of smoothing necessary, and gives immediate access
> to all the downstream limma pipeline including gene set testing.
>
> My feeling the moment is that edgeR is superior for small counts and but
> that voom is safer and more reliable for noisy heterogeneous data. Only
> edgeR can estimate the biological coefficient of variation (as we
> defined this in our 2012 paper). But we are actively working on both
> methods, and are open to what we find.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Fri, 4 Jan 2013, Reiner Schulz wrote:
>
>> Thank you very much, Gordon.
>>
>> Out of curiosity, why did you develop the voom method when there is
>> edgeR? Are there good, perhaps data-dependent, reasons for using one
>> over the other?
>>
>> Best, Reiner
>>
>> On 03/01/13 23:59, Gordon K Smyth wrote:
>>> Dear Reiner,
>>>
>>> You don't mention contrasts.fit(), but I assume that you are using that
>>> function to get equivalent quantities from the two design matrices.
>>>
>>> If I am guessing your problem correctly, then the issue is that
>>> contrasts.fit() gives results that are not exact in the presence of voom
>>> weights. See the warning note on the help page for contrasts.fit.
>>>
>>> You will get the exact correct t-statistic if the contrast you want is a
>>> coefficient of your design matrix. You should also get exact correct
>>> results when using the one way layout design formula ~0+cond, but you
>>> may get an approximate test statistic if you use a contrast from the
>>> nested formula. The fold changes are exact in all cases.
>>>
>>> There was a thread about this a few years ago, see:
>>>
>>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030947.html
>>>
>>>
>>>
>>> Basically I have promised to change contrasts.fit() to give exact
>>> results in all cases, even though this will make it much slower in some
>>> cases. I have not so far found the time to do it, but will definitely
>>> do so by the time we publish the voom method.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>>
>>>
>>>> Date: Wed, 2 Jan 2013 13:31:58 +0100
>>>> From: Reiner Schulz <reiner.schulz at kcl.ac.uk>
>>>> To: <bioconductor at r-project.org>
>>>> Subject: [BioC] limma, voom, weights: factorial versus nested
>>>> interaction results
>>>>
>>>> Dear all,
>>>>
>>>> Am analysing an RNA-seq experiment with a two-way factorial design, and
>>>> am observing different results for two contrasts of interest depending
>>>> on whether I use a factorial model matrix or a nested interaction model
>>>> matrix. I only observe this if the data are pre-processed w/ voom:
>>>>> counts.voom= voom( ...
>>>> i.e., when there are observational level weights passed to lmFit. If I
>>>> only pass counts.voom$E (no observational level weights) to lmFit, the
>>>> results for the two contrasts of interest are identical for the
>>>> factorial and nested interaction model matrices.
>>>> Is this to be expected, and if so, what is the explanation?
>>>>
>>>> Much appreciated, Reiner
>>>>
>>>> PS: some more details ...
>>>>
>>>>> strain
>>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT
>>>> WT WT
>>>> [20] WT WT WT WT WT WT WT
>>>> Levels: WT MUT
>>>>> treat
>>>> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2
>>>> T1 U T2
>>>> [26] T1
>>>> Levels: U T1 T2
>>>>
>>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U',
>>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2'))
>>>>> design.matrix= model.matrix( ~0 + cond)
>>>>> design.matrix
>>>> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2
>>>> 1 0 1 0 0 0 0
>>>> 2 0 0 0 0 0 1
>>>> 3 0 0 0 1 0 0
>>>> 4 0 1 0 0 0 0
>>>> 5 0 0 0 0 0 1
>>>> 6 0 0 0 1 0 0
>>>> 7 0 1 0 0 0 0
>>>> 8 0 1 0 0 0 0
>>>> 9 0 0 0 0 0 1
>>>> 10 0 0 0 1 0 0
>>>> 11 0 1 0 0 0 0
>>>> 12 0 0 0 0 0 1
>>>> 13 0 0 0 1 0 0
>>>> 14 1 0 0 0 0 0
>>>> 15 0 0 0 0 1 0
>>>> 16 0 0 1 0 0 0
>>>> 17 1 0 0 0 0 0
>>>> 18 0 0 0 0 1 0
>>>> 19 0 0 1 0 0 0
>>>> 20 1 0 0 0 0 0
>>>> 21 1 0 0 0 0 0
>>>> 22 0 0 0 0 1 0
>>>> 23 0 0 1 0 0 0
>>>> 24 1 0 0 0 0 0
>>>> 25 0 0 0 0 1 0
>>>> 26 0 0 1 0 0 0
>>>>
>>>>> design.matrix.alt= model.matrix( ~strain + strain:treat)
>>>>> design.matrix.alt
>>>> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2
>>>> 1 1 1 0 0 0 0
>>>> 2 1 1 0 0 0 1
>>>> 3 1 1 0 1 0 0
>>>> 4 1 1 0 0 0 0
>>>> 5 1 1 0 0 0 1
>>>> 6 1 1 0 1 0 0
>>>> 7 1 1 0 0 0 0
>>>> 8 1 1 0 0 0 0
>>>> 9 1 1 0 0 0 1
>>>> 10 1 1 0 1 0 0
>>>> 11 1 1 0 0 0 0
>>>> 12 1 1 0 0 0 1
>>>> 13 1 1 0 1 0 0
>>>> 14 1 0 0 0 0 0
>>>> 15 1 0 0 0 1 0
>>>> 16 1 0 1 0 0 0
>>>> 17 1 0 0 0 0 0
>>>> 18 1 0 0 0 1 0
>>>> 19 1 0 1 0 0 0
>>>> 20 1 0 0 0 0 0
>>>> 21 1 0 0 0 0 0
>>>> 22 1 0 0 0 1 0
>>>> 23 1 0 1 0 0 0
>>>> 24 1 0 0 0 0 0
>>>> 25 1 0 0 0 1 0
>>>> 26 1 0 1 0 0 0
>>>>
>>>> Observed equalities (as expected):
>>>> MUT == MUT.U - WT.U
>>>> WT:T1 == WT.T1 - WT.U
>>>> MUT:T1 == MUT.T1 - MUT.U
>>>> WT:T2 == WT.T2 - WT.U
>>>> MUT:T2 == MUT.T2 - MUT.U
>>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U)
>>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U)
>>>>
>>>> Observed inequalities (only w/ voom pre-processed data; unexpected):
>>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U)
>>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U)
>>>>
>>>> --
>>>>
>>>
>>
>> --
>> Reiner Schulz, Ph.D.
>> Senior Lecturer in Bioinformatics and Epigenomics
>> Dept. of Medical and Molecular Genetics
>> King's College London
>> Guy's Hospital, 8th floor Tower Wing
>> London SE1 9RT
>> https://atlas.genetics.kcl.ac.uk/~rschulz
>>
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:21}}
More information about the Bioconductor
mailing list