[BioC] Nanostring ncounterdata - DESeq

Mon Jul 25 20:46:19 CEST 2011

Hi Vanessa

On 2011-07-25 13:33, Vanessa Vermeirssen wrote:
 > [...]
> Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if
> they have the same properties like RNASeq data i.e. I do only have the
> counts for 110
> specifically selected genes. The deeper sampling of one sample compared
> to another e.g. is less
> applicable.
> The manufacturer suggested some preprocessing of the data: scaling
> against positive spike-ins, substracting background (and absent/present
> call generation).
> In addition, we performed a normalization with 4 household genes
> (selected out of the 8
> included in the 110 genes).
>
> I did the DESeq package analysis using these preprocessed data, is this
> package also appropriate in this case (e.g. the library normalization
> step?)? Is the preprocessing correct for this?

In principle, the model used by DESeq should work for any kind of count 
data. The crucial part is that you give it count data, i.e., integer 
counts of detected tags, without any normalization or the like.

If you decide against using DESeq's normalization scheme and prefer to 
use Nanostring's, you need to hand the scaling factors calculated by 
their algorithm to DESeq by writing them into the 'sizeFactors' slot of 
the CountDataSet, i.e., use
    sizeFactors(cds) <- c( 1.2, 1.0, .95, ...)
whith the scaling factors that you somehow got from the Nanostrings 
software instead of
    cds <- estimateSizeFactors(cds)

Look at pairwise MA plots to check whether the normalization worked.

There is, however, no way to incorporate background correction to DESeq, 
because, for RNA-Seq analysis, this is not needed. With nanostrings, you 
have cross-hybridization and hence background. If the background level 
is the same in all samples, it should not influence your differential 
expression calculation, and you can ignore it. Otherwise, it might be an 
issue.

> In addition, I also did a t-test (paired and normal, equal variance,
> which I tested, on the log2 data), because this has been described in
> literature before.

This is no big surprise. With only few replicates, a standard t test 
does not have much power. This was, after all, the motivation behind the 
development of limma.

> Another paper describes an FDR permutation approach, but they don't seem
> to have any biological replicates, but 32 control experiments and 10
> control genes (Amit et al., 2009).
> I also tried to do this on our data.

I'm a bit puzzled how a permutation test without replicates might work, 
but I don't know the paper.

 > [...]
> A minor question relates to the preprocessing. How should I deal with
> absent/present calls obtained after the preprocessing in the course of
> statistical analysis?
> Should I include them as NAs from the beginning, or re-evaluate the
> results at the end?

Hard to say. As I am not familiar with the technology, i don't know what 
they base these calls on. As you cannot incorporate background 
correction, anyway, it might be best to ignore the absence/presence 
calls as well and use all data.

Please let us know whether it works.

   Simon