[BioC] Nanostring ncounterdata - DESeq
Simon Anders
anders at embl.de
Mon Jul 25 20:46:19 CEST 2011
Hi Vanessa
On 2011-07-25 13:33, Vanessa Vermeirssen wrote:
> [...]
> Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if
> they have the same properties like RNASeq data i.e. I do only have the
> counts for 110
> specifically selected genes. The deeper sampling of one sample compared
> to another e.g. is less
> applicable.
> The manufacturer suggested some preprocessing of the data: scaling
> against positive spike-ins, substracting background (and absent/present
> call generation).
> In addition, we performed a normalization with 4 household genes
> (selected out of the 8
> included in the 110 genes).
>
> I did the DESeq package analysis using these preprocessed data, is this
> package also appropriate in this case (e.g. the library normalization
> step?)? Is the preprocessing correct for this?
In principle, the model used by DESeq should work for any kind of count
data. The crucial part is that you give it count data, i.e., integer
counts of detected tags, without any normalization or the like.
If you decide against using DESeq's normalization scheme and prefer to
use Nanostring's, you need to hand the scaling factors calculated by
their algorithm to DESeq by writing them into the 'sizeFactors' slot of
the CountDataSet, i.e., use
sizeFactors(cds) <- c( 1.2, 1.0, .95, ...)
whith the scaling factors that you somehow got from the Nanostrings
software instead of
cds <- estimateSizeFactors(cds)
Look at pairwise MA plots to check whether the normalization worked.
There is, however, no way to incorporate background correction to DESeq,
because, for RNA-Seq analysis, this is not needed. With nanostrings, you
have cross-hybridization and hence background. If the background level
is the same in all samples, it should not influence your differential
expression calculation, and you can ignore it. Otherwise, it might be an
issue.
> In addition, I also did a t-test (paired and normal, equal variance,
> which I tested, on the log2 data), because this has been described in
> literature before.
This is no big surprise. With only few replicates, a standard t test
does not have much power. This was, after all, the motivation behind the
development of limma.
> Another paper describes an FDR permutation approach, but they don't seem
> to have any biological replicates, but 32 control experiments and 10
> control genes (Amit et al., 2009).
> I also tried to do this on our data.
I'm a bit puzzled how a permutation test without replicates might work,
but I don't know the paper.
> [...]
> A minor question relates to the preprocessing. How should I deal with
> absent/present calls obtained after the preprocessing in the course of
> statistical analysis?
> Should I include them as NAs from the beginning, or re-evaluate the
> results at the end?
Hard to say. As I am not familiar with the technology, i don't know what
they base these calls on. As you cannot incorporate background
correction, anyway, it might be best to ignore the absence/presence
calls as well and use all data.
Please let us know whether it works.
Simon
More information about the Bioconductor
mailing list