[BioC] DESeq for an mRNA-seq time course

Thu Jan 5 00:33:10 CET 2012

Hello everyone,

I'm trying to figure out whether the DESeq package and its GLM function 
will work for a complex mRNA-seq time course. I have a time course of 
five time points, with treated and untreated samples at each time point, 
and multiple replicates. The preliminary results from this program make 
me sure I haven't set up the contrast correctly, because the results 
look wrong to me (more than half of the genome is differentially 
expressed at an FDR <1%). Am I missing something? I note that in the 
DESeq package vignette, the example case involves binary conditions 
(treated/untreated vs paired/single end). Will DESeq even work for a 
time course of five time points, or am I wasting my time? My script, 
with output, is below.  I apologize in advance - it is quite lengthy!

any help on this would be so fantastic!
Elena

 > options(digits=3)
 > setwd("C:\\Rdata\\DESeq\\GLMs")
 > library(DESeq)
 >
 > # input time course data
 > countsTable<-read.delim("input2_mut_ALL.txt",header=TRUE, 
stringsAsFactors=TRUE)
 > head(countsTable)
      gene untreat.1 untreat.2 D.1h.1 D.1h.2 U.1h.1 U1h.2 D.2h.1 D.2h.2 
U.2h.1 U.2h.2 D.6h.1 D.6h.2 U.6h.1 U.6h.2 D.18h.1 D.18h.2 U.18h.1 U.18h.2
1  2L52.1        98        55    100    103     98    69     70     
64     73     36     59     49     33     32      92      83      47      47
2 2RSSE.1       423       261    374    459    327   380    216    
215    191    223    146    155    117    110     196     133     
110     110
3 2RSSE.2         8         9      8     16      4     8      4      
3      2      3      2      1      2      0       2       1       1       1
4 2RSSE.3         0         0      0      0      0     0      0      
0      0      0      0      0      0      0       0       0       0       0
5   3R5.1        74        38     96     46     69    52     56     
41     38     32     31     14     28     31      38      33      81      81
6   3R5.2         0         0      0      0      0     0      0      
0      0      0      0      0      0      0       0       0       0       0

 > rownames(countsTable) <- countsTable$gene
 > countsTable <- countsTable [ , -1]
 >
 > # make data.frame
 > design <- read.table("C:\\Rdata\\DESeq\\puf8lip1\\GLMs\\design2.txt", 
header = T, row.names=1)
 > design<- as.data.frame(design)
 > is.data.frame(design)
[1] TRUE
 >
 > design
           treat.type time.hr
untreat.1       none       0
untreat.2       none       0
D.1h.1       vehicle       1
D.1h.2       vehicle       1
U.1h.1          drug       1
U.1h.2          drug       1
D.2h.2       vehicle       2
D.2h.1       vehicle       2
U.2h.1          drug       2
U.2h.2          drug       2
D.6h.1       vehicle       6
D.6h.2       vehicle       6
U.6h.1          drug       6
U.6h.2          drug       6
D.18h.1      vehicle      18
D.18h.2      vehicle      18
U.18h.1         drug      18
U.18h.2         drug      18
 >
 > # create a count data object
 > cds <- newCountDataSet( countsTable, design)
 > head(counts(cds))
         untreat.1 untreat.2 D.1h.1 D.1h.2 U.1h.1 U1h.2 D.2h.1 D.2h.2 
U.2h.1 U.2h.2 D.6h.1 D.6h.2 U.6h.1 U.6h.2 D.18h.1 D.18h.2 U.18h.1 U.18h.2
2L52.1         98        55    100    103     98    69     70     64     
73     36     59     49     33     32      92      83      47      47
2RSSE.1       423       261    374    459    327   380    216    215    
191    223    146    155    117    110     196     133     110     110
2RSSE.2         8         9      8     16      4     8      4      
3      2      3      2      1      2      0       2       1       1       1
2RSSE.3         0         0      0      0      0     0      0      
0      0      0      0      0      0      0       0       0       0       0
3R5.1          74        38     96     46     69    52     56     41     
38     32     31     14     28     31      38      33      81      81
3R5.2           0         0      0      0      0     0      0      
0      0      0      0      0      0      0       0       0       0       0
 >
 > # estimate library size of count data set
 > cds <- estimateSizeFactors ( cds )
 > sizeFactors( cds)
untreat.1 untreat.2    D.1h.1    D.1h.2    U.1h.1     U1h.2    D.2h.1    
D.2h.2    U.2h.1    U.2h.2    D.6h.1    D.6h.2    U.6h.1    U.6h.2   
D.18h.1   D.18h.2
     2.307     1.149     1.435     1.301     1.370     1.230     
1.152     1.040     1.079     0.926     0.602     0.593     0.620     
0.596     0.926     0.694
   U.18h.1   U.18h.2
     0.887     0.887
 >
 > # estimate dispersion (biological variation)
 > # this is done for each condition/factor
 > cds <- estimateDispersions( cds , method = "pooled")
 >
 > ### Check the fit ######
 >
 > plotDispEsts <- function( cds )
+ {
+    plot(
+       rowMeans( counts( cds, normalized=TRUE ) ),
+       fitInfo(cds)$perGeneDispEsts,
+       pch = 8, log="xy",xlab = "Per Gene Average Expression Level (in 
counts)", ylab= "Per Gene Variance Estimate")
+    xg <- 10^seq( -.5, 5, length.out=300 )
+    lines( xg, fitInfo(cds)$dispFun( xg ), col="red", pch=16 )
+ }
 >
 > plotDispEsts(cds)
Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) :
   6941 x values <= 0 omitted from logarithmic plot
2: In xy.coords(x, y, xlabel, ylabel, log) :
   6968 y values <= 0 omitted from logarithmic plot
 > head(fData(cds)) #dispersion values are stored in the feature data 
slot of cds
         disp_pooled
2L52.1       0.0245
2RSSE.1      0.0192
2RSSE.2      0.3018
2RSSE.3         Inf
3R5.1        0.0308
3R5.2           Inf
 > str(fitInfo(cds)) #verify that the displ_pooled
List of 5
  $ perGeneDispEsts: num [1:27924] 0.00641 0.0192 0.10196 NaN 0.02612 ...
  $ dispFunc       :function (q)
   ..- attr(*, "coefficients")= Named num [1:2] 0.00894 1.04347
   .. ..- attr(*, "names")= chr [1:2] "asymptDisp" "extraPois"
   ..- attr(*, "fitType")= chr "parametric"
  $ fittedDispEsts : num [1:27924] 0.0245 0.0137 0.3018 Inf 0.0308 ...
  $ df             : int 9
  $ sharingMode    : chr "maximum"
 >
 > ###### Fit data to Gen. Lin. Model #####
 >
 > fit1 <- fitNbinomGLMs(cds, count ~ treat.type + time.hr)
..................
There were 50 or more warnings (use warnings() to see the first 50)
 > fit0 <- fitNbinomGLMs(cds, count ~ treat.type)
..................
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: algorithm did not converge
3: glm.fit: algorithm did not converge
4: glm.fit: algorithm did not converge
5: glm.fit: algorithm did not converge
 >
 > ###DIFF EXPRESSION ##########
 >
 > str(fit1)
'data.frame':   27924 obs. of  6 variables:
  $ (Intercept)    : num  6.24 8.16 2.67 NA 5.28 ...
  $ treat.typenone : num  -0.757 -0.484 -0.225 NA -0.255 ...
  $ treat.typeU0126: num  -0.532 -0.295 -0.713 NA 0.226 ...
  $ time.hr        : num  0.0171 -0.0392 -0.1143 NA 0.035 ...
  $ deviance       : num  13.95 22.92 8.29 NA 23.29 ...
  $ converged      : logi  TRUE TRUE TRUE NA TRUE NA ...
  - attr(*, "df.residual")= Named num 14
   ..- attr(*, "names")= chr "2L52.1"
 > pvalsGLM <- nbinomGLMTest (fit1, fit0)
 > padjGLM <- p.adjust (pvalsGLM, method = "BH") # correct for mult.test
 > head(fit1) # hope to see fully-converged, ie TRUE
         (Intercept) treat.typenone treat.typeU0126 time.hr deviance 
converged
2L52.1         6.24         -0.757          -0.532  0.0171    13.95      
TRUE
2RSSE.1        8.16         -0.484          -0.295 -0.0392    22.92      
TRUE
2RSSE.2        2.67         -0.225          -0.713 -0.1143     8.29      
TRUE
2RSSE.3          NA             NA              NA      NA       
NA        NA
3R5.1          5.28         -0.255           0.226  0.0350    23.29      
TRUE
3R5.2            NA             NA              NA      NA       
NA        NA
 > head(padjGLM)
[1] 1.73e-01 1.39e-05 3.66e-02       NA 6.29e-03       NA
 >
 > hist(pvalsGLM, breaks=100, col="skyblue", border="slateblue", main="")
 >
 >
 > ######### Filter for significant genes at p < 0.05 ########
 > res <- cbind(fit1,pvalsGLM,padjGLM)
 > resSig <- res[ res$padjGLM < 0.05, ]
 >
 > ######### LAST PART: WRITE RESULTS TO .CSV FILE ###########
 > write.table(resSig, file="DESeqOutput.txt", quote=FALSE)
 >
 > ############
 > sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] pasilla_0.2.10 DESeq_1.6.1    locfit_1.5-6   lattice_0.20-0 
akima_0.5-4    Biobase_2.14.0

loaded via a namespace (and not attached):
  [1] annotate_1.32.0       AnnotationDbi_1.16.10 DBI_0.2-5             
genefilter_1.36.0     geneplotter_1.32.1    grid_2.14.0           
IRanges_1.12.5
  [8] RColorBrewer_1.0-5    RSQLite_0.11.1        splines_2.14.0        
survival_2.36-10      tools_2.14.0          xtable_1.6-0
 >