[BioC] How to use DESeq to normalize and estimate variance in a RNAseq timecourse analysis
Marie Sémon
marie.semon at ens-lyon.fr
Fri May 11 11:31:13 CEST 2012
Dear Simon,
>> In our dataset, we tried both procedure and we do see a difference in
>> the DESeq output. Maybe, as you said, the estimation of dispersion is
>> the same for both procedures, but the normalization step (estimation of
>> size Factors ) gives different outputs when using complete or partial
>> tables (with only a subset of the samples)?
>
> yes, this could be, but I'd be surprised if it make much of a
> difference in the test outcomes. If you are still worried about the
> issue, maybe post some detilas. What size factors do you get using
> only one time point at a time and what do you get using all of them
> together? Can you find an example for a gene where you see an
> appreciable difference in the p value? If so, are the dispersion
> estimates the same?
I tried to paste below an example, as you suggested:
Here is an example from our data (the time points after treatments are
T1, T2, T3, T4, T5, the three controls are Ctrl1, Ctrl2, Ctrl3).
The size factors estimated on the complete table are the following:
Ctrl1 0.811399035473249
Ctrl2 0.858304900598826
Ctrl3 0.959802357106788
T1 0.947672016144435
T2 1.05315240155981
T3 1.13022212977686
T4 1.22731615452888
T5 1.19028477928069
The size factors estimated on a partial table (restricted to Controls +
T5) are the following:
Ctrl1 0.868784756382365
Ctrl2 0.918880737221278
Ctrl3 1.020617176156
T5 1.2627166738945
As you can see, they seem to be quite different. This seem to translate
in different numbers of significant genes (between Ctr and T5) for the
two cases (2755 genes with padj<0.001 when the complete table is taken
into account, and 2976 genes with padj <0.001 for the partial table is
taken into account). Furthermore, the lists do not overlap completely:
FALSE TRUE
FALSE 18135 303
TRUE 82 2673
We picked up randomly two genes (gene A and gene B) and show DEseq
results comparing Ctrls and T5, after normalizing using the Partial or
Complete table
Partial table
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
geneA 1129.345865 965.1611989 1621.899863
1.680444536 0.748842926 0.000170905 1.19E-03
Complete table
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
geneA 1203.113666 1030.619339 1720.596647 1.669478324
0.739397362 8.74E-06 9.09E-05
Partial table
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
geneB 16.32456228 3.55138732 54.64408717 15.38668758
3.943610779 4.28E-05 3.47E-04
Complete table
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
geneB 17.33523065 3.79053399 57.96932062 15.29318053
3.934816571 0.001910755 1.06E-02
I hope these results will be sufficiently detailed to be helpful to
understand our problem. If not, please do not hesitate to ask for more
information.
Thanks a lot again for your help!
Marie
More information about the Bioconductor
mailing list