[BioC] Behaviour of weights in limma

Sun Jun 8 05:01:27 CEST 2014

On Thu, Jun 5, 2014 at 10:04 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear Paul,
>
>> Date: Wed, 4 Jun 2014 17:30:59 +1000
>> From: Paul Harrison <Paul.Harrison at monash.edu>
>> To: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: [BioC] Behaviour of weights in limma
>>
>> Hello,
>>
>> I have some data from a variant of RNA-seq which I am hoping do some
>> moderated t-test differential testing on with limma. In this data, many of
>> the reads have sequenced through into the poly(A) tail, and we believe this
>> gives us information about changes in poly(A) tail length.
>>
>> For each gene and sample, we can calculate an average observed tail
>> length. It seems easy enough to calculate a standard error for this average
>> as well.
>
>
> I don't think that you can actually calculate a measingful standard error.
> The total error depends on both biological and technical components.  You
> can predict how the measurement error depends on the number of reads, but
> you don't know what proportion of the total error the measurement error
> makes up.
>

Yes. Sorry, I meant that the technical error is fairly accurately known.

>> In some cases we have few reads and the standard error is high, in others
>> we have quite a lot of reads and the standard error is low.
>>
>> What I'm hoping is that this can be translated into weights that can be
>> fed to limma to make it behave correctly. Do weights have some specific
>> meaning in terms of measurement variance?
>
>
> They have a specific meaning, but it is in terms of total variance not in
> terms of measurement variance.
>
> The meaning of weights in limma is the same as for any linear modelling or
> regression procedures, which is that the total variance is assumed inversely
> proportional to the weight.
>

Ah.

>> And how does this interact with moderation between genes,
>
>
> Intimately.
>
>> for example could including highly noisy measurements from some genes
>> detract from the significance of other genes where the measurement is more
>> precise?
>
>
> Yes.
>

That would also mean that even in conventional RNA-seq data it could
be worthwhile to filter out low coverage genes before applying voom
and limma?

> Could you not simply use voom or edgeR, both of which already do what you
> seem to be asking, which is to take the number of reads into account when
> estimating variability and assessing DE?
>

To use voom I would need to alter the voom function to take the
average tail lengths as a parameter in addition to counts. This looks
fairly straightforward.

Given what you've said above, an alternative would be, just for the
purpose of calculating weights, to come up with a constant value for
the biological variance, for example by Maximum Likelihood.

Thank you,

Paul Harrison

Victorian Bioinformatics Consortium / Monash University