[BioC] RNAseq expression analysis using DESeq: technical replicates, paired samples
Simon Anders
anders at embl.de
Tue Nov 8 09:51:44 CET 2011
Hi Michael
On 2011-11-07 23:44, Michael Muratet wrote:
> No, I really was just checking for updates in how to arrange the input.
> I understand the limitations of no biological replicates. However, as
> noted in section 6 of the vignette, '...such experiments are often
> undertaken...' and we hope to confirm what we find by other means.
>
> I thought I had seen documentation somewhere on how to incorporate
> technical replicates into deseq, but I could not put my hands back on
> it, so I thought I'd ask. I have actually processed the original data
> once and got what results we could get. I don't expect the tech
> replicates to change the answer.
Sorry if I jumped a bit at you in my answer, I just was asked this
question a bit too often recently and maybe I should not write emails
late in the evening.
In my old post of last year, I refered to the papers of Nagalakshmi et
al. and Marioni et al., who notices that the variance between technical
replicates follows a Poisson distribution. Now, if you add up two
Poisson distributed random variable, the sum is Poisson-distributed,
too; and this is why any method looking at the technical replicates
separately will not have more useful information to work with than one
that is given the sum.
Even if you have overdispersion (i.e., variance in excess of Poisson)
between technical replicates, summing them up will smooth this out a
bit, and what is left will simply become part of the variance seen
between the biological replicates (i.e., the sums of the technical
ones). So if you have layered experiment with both technical and
biological, I do not think it is worth the effort to account for this
hierarchy in the analysis.
Your situation seems to be that you have no biological replicates but
expect to see only so few good hits that you can follow them up with
independent, replicated verification. For such cases, I recently
wondered whether the following trick might help:
Make your best guess for a biological coefficient of variation. Let us
say, we expect, from experience with previous experiments, that the gene
expression between replicates, if you had them, would typically vary by
15% (i.e., 0.15). Then you can inject this dispersion value into your
DESeq calculation:
# Start as usual:
library( DESeq )
# An example data set with only two samples:
cds <- makeExampleCountDataSet()[,2:3]
# Estimate the dispersion, only to throw it away in the next step
cds <- estimateDispersions( cds, method="blind",
sharingMode="fit-only" )
# Overwrite the estimates with our guessed fixed value:
fData(cds)$disp_blind <- .15^2
# See what you get
res <- nbinomTest( cds, "A", "B" )
If you have some hunch what the dispersion might be, you can get some
preliminary results that way. Do not take the p values too literally, of
course, as they are based on guesswork. Observe how the ranking of the
genes and the number of hits changes if you try different dispersion
guesses. This help you anticipate how much success you can hope for in
your subsequent effort of verification, and for how many genes you
should attempt verification.
Some _very_ _rough_ (and maybe completely wrong) values for coefficients
of variations, as I have seen them in the past:
- Comparison between isogenic liquid yeast cell cultures, highly
controlled experiment: Below 10% or even 5% (dispersion .1^2 or .05^2)
if you are lucky
- Comparison between isogenic mammalian cell cultures: maybe around 15%
if things go well
- Comparison between tissues derived from isogenic litter mates: maybe
20% to 35%
- Comparison between different non-related individuals: 50% or even much
more
Best regards
Simon
More information about the Bioconductor
mailing list