[BioC] Using DESeq to test for null ratios other than 1:1

Simon Anders anders at embl.de
Thu Jan 2 11:14:41 CET 2014

Hi Emily

On 02/01/14 10:27, Emily [guest] wrote:
> I was wondering whether I could use DESeq to test for differences
> between two groups with an inherent bias in their reads proportional
> to each other. By this I mean that instead of each gene being tested
> with the null of 1:1  could I ask the program to test for a gene
> where the null is say 1.5:1 and another gene with a null of 3:4 etc.
> It would be useful to input a list of null ratios for a list of
> genes.

You could use DESeq2's "normalization factors". These are like the size 
factors, but can be specified for each gene separately.

Proceed as follows:

First, estimate the size factors (to account for sequencing depth):

   dds <- estimateSizeFactors( dds )

Then, given a vector 'b' of biases, one for each gene, expand it to a 
matrix with one column for each sample, which is 1 for control samples 
and b for treatment samples. For example:

   biasMatrix <- sapply( pData(dds)$condition, function(cond)
      if( cond=="treated" ) b else rep( 1, length(b) ) )

Then, multiply this matrix with the sifze factors to get the 
normalization factor matrix, and assign this to your DESeq2 object:

   normalizationFactors(dds) <- t( t(biasMatrix) * sizeFactors(dds) )

The double transposition here ensures that the columns, not the rows, 
are multiplied by the size factors. Note also that assigning 
normalization factors overrides the size factors, and this is why we 
have to absorb them into the normalization factors.

Now estimate dispersions and do the test as usual. The result will then 
report fold changes relative to the bias.

To final remarks, just for the record: (Ask again if it applies and you 
need details.)

You are still testing against a point null. If your purpose is to test 
against a banded null (fold change larger than some threshold), there is 
different functionality provided for this.

And if the biases have been estimated from other count data, you should 
probably use a complex design matrix in order to analyse all data in one 
go, in order to account for uncertainty in the biases.


More information about the Bioconductor mailing list