[BioC] Using DESeq to test for null ratios other than 1:1
anders at embl.de
Thu Jan 2 11:14:41 CET 2014
On 02/01/14 10:27, Emily [guest] wrote:
> I was wondering whether I could use DESeq to test for differences
> between two groups with an inherent bias in their reads proportional
> to each other. By this I mean that instead of each gene being tested
> with the null of 1:1 could I ask the program to test for a gene
> where the null is say 1.5:1 and another gene with a null of 3:4 etc.
> It would be useful to input a list of null ratios for a list of
You could use DESeq2's "normalization factors". These are like the size
factors, but can be specified for each gene separately.
Proceed as follows:
First, estimate the size factors (to account for sequencing depth):
dds <- estimateSizeFactors( dds )
Then, given a vector 'b' of biases, one for each gene, expand it to a
matrix with one column for each sample, which is 1 for control samples
and b for treatment samples. For example:
biasMatrix <- sapply( pData(dds)$condition, function(cond)
if( cond=="treated" ) b else rep( 1, length(b) ) )
Then, multiply this matrix with the sifze factors to get the
normalization factor matrix, and assign this to your DESeq2 object:
normalizationFactors(dds) <- t( t(biasMatrix) * sizeFactors(dds) )
The double transposition here ensures that the columns, not the rows,
are multiplied by the size factors. Note also that assigning
normalization factors overrides the size factors, and this is why we
have to absorb them into the normalization factors.
Now estimate dispersions and do the test as usual. The result will then
report fold changes relative to the bias.
To final remarks, just for the record: (Ask again if it applies and you
You are still testing against a point null. If your purpose is to test
against a banded null (fold change larger than some threshold), there is
different functionality provided for this.
And if the biases have been estimated from other count data, you should
probably use a complex design matrix in order to analyse all data in one
go, in order to account for uncertainty in the biases.
More information about the Bioconductor