[BioC] DESeq with no replicates

Thu Dec 16 20:55:18 CET 2010

Dear list, 

I have a smallRNA sequencing data from two samples, without replicates.
I tried using DESeq to identify differentially expressed gene following
the manual, and the code is as below:

> library(DESeq)
> conds <- c("NSC", "TSC")
> head(combined_data)
                 NSC    TSC
hsa-let-7a    906684 413336
hsa-let-7a*        2   1281
hsa-let-7a-2*     51    132
hsa-let-7b    179751  63079
hsa-let-7b*       46    684
hsa-let-7c    380745  50729

> apply(combined_data, 2, sum)
    NSC     TSC 
5018095 8658429 

> cds <- newCountDataSet(combined_data, conds)
> cds <- estimateSizeFactors( cds )
> sizeFactors(cds)
NSC TSC 
  1   1 

> cds <- estimateVarianceFunctions(cds, pool=T)
> res <- nbinomTest(cds, "NSC", "TSC")

> head(res[order(res$pval),])

                id baseMean baseMeanA baseMeanB foldChange
log2FoldChange   pval      padj resVarA resVarB
397 hsa-miR-299-5p    329.5       329       330  1.0030395
0.004378441 0.002408825 0.9236618      NA      NA
131   hsa-miR-1285    407.0       406       408  1.0049261
0.007089425 0.002922980 0.9236618      NA      NA
196   hsa-miR-146a    170.5       170       171  1.0058824
0.008461579 0.004641615 0.9696484      NA      NA
525   hsa-miR-425*    180.5       178       183  1.0280899
0.039966407 0.008776619 0.9696484      NA      NA
826   hsa-miR-675*    714.0       722       706  0.9778393
-0.032330654 0.008892177 0.9696484      NA      NA
505   hsa-miR-379*    260.5       256       265  1.0351562
0.049848549 0.012192131 0.9696484      NA      NA

> sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] preprocessCore_1.10.0 limma_3.4.5           DESeq_1.0.6
locfit_1.5-6          lattice_0.18-8       
[6] akima_0.5-4           Biobase_2.8.0        

loaded via a namespace (and not attached):
 [1] annotate_1.26.1      AnnotationDbi_1.10.2 DBI_0.2-5
genefilter_1.30.0    geneplotter_1.26.0  
 [6] grid_2.11.1          RColorBrewer_1.0-2   RSQLite_0.9-2
splines_2.11.1       survival_2.35-8     
[11] tools_2.11.1         xtable_1.5-6    

I have a few questions here:

1. The sizeFactors were estimated as 1, but the actual total number of
reads in the two samples differ 60%, could it be correct? 

2. It seems that the identified top candidates with the smallest p value
are all with very small fold changes. I have seen huge difference
between the two samples for some miRNA, but the P value was close to 1;

3. The resVarA and resVarB are NA for all genes, did I do something
wrong when calling the estimateVarianceFunctions?

4. For a RNA or small RNA sequencing experiment with only two samples,
what is the best way to normalize data? 

Any help will be greatly appreciated.

Xiwei Wu, Ph.D.
Bioinformatics Core
Assistant Research Professor
Department of Molecular Medicine
Beckman Research Institute
City of Hope
x.65071

---------------------------------------------------------------------
SECURITY/CONFIDENTIALITY WARNING:  \ This message and ...{{dropped:30}}