[BioC] Multifactor analysis with deseq

Fri Mar 22 20:52:03 CET 2013

Dear Dr Anders

I have a few questions about applying deseq to a multifactor analysis that I am doing. I have three factors: three levels of genotype, three levels of "cell culture" and two time points. I have three replicates in each cell. The comparison(s) of most interest are changes in expression with genotype and days. I am also interested in knowing the effects of cell culture, but it's a technical issue.

I have used HTSeq-count with the ENSEMBL v54 H.sapiens gtf to get counts from reads aligned with tophat. I have 54 libraries total. The first few rows of the conditions file look like:

        day culture genotype
SL27450   7 33D6p63     NT
SL27451   7 33D6p63	NT
SL27452   7 33D6p63     NT
SL27456  14 33D6p63     NT
SL27457  14 33D6p63     NT
SL27458  14 33D6p63     NT

such that each set of factors is repeated three times.

After normalization and dispersion estimation I did:

fit0 <- fitNbinomGLMs(cds, count~genotype+day+culture)
fit1 <- fitNbinomGLMs(cds, count~day+culture)
then
pvalsGLM = nbinomGLMTest( fit1, fit0 )

The result was entirely NaN or NAs. I'll put some output and the environment at the end of this email. I'm not sure where to start looking for the problem.

At the moment, I am wondering how best to implement the pairwise comparison alluded to in the "fix me" in the manual I have. I am wondering:

Does the formula in the fitNbinomGLMs() function work the same way as lm(), e.g., do I have to explicitly include interactions or will it calculate them for me?

Is there a way to use relevel() to set the reference level? I have a wild-type genotype that would be natural for comparing two mutant genotypes.

I don't see any references on the list, but has anybody tried to use normalized values from a CountDataSet outside of deseq?

Lastly, I saw on the list while researching this issue that deseq2 is out. I'll give it try.

Here is str() output from the fits (which seems OK to me)

> str(fit0)
'data.frame':   37435 obs. of  8 variables:
 $ (Intercept)   : num  9.01 1.8 7.69 7.06 6.63 ...
 $ genotypeNT    : num  -0.0666 0.0223 -0.08 -0.1554 -0.1967 ...
 $ genotypeWT    : num  -0.1665 -0.6294 -0.0716 -0.0908 -0.0754 ...
 $ day7          : num  -0.0849 -0.4425 -0.0596 -0.1809 -0.2374 ...
 $ culture33D6p63: num  -0.153 -1.917 0.119 0.715 0.518 ...
 $ culture34D6p62: num  -0.2925 -0.3422 0.0181 0.242 0.0233 ...
 $ deviance      : num  46.1 44.6 51.8 21.1 60.2 ...
 $ converged     : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 - attr(*, "df.residual")= num 48
> str(fit1)
'data.frame':   37435 obs. of  6 variables:
 $ (Intercept)   : num  8.94 1.6 7.64 6.99 6.54 ...
 $ day7          : num  -0.0826 -0.4417 -0.0589 -0.1793 -0.2361 ...
 $ culture33D6p63: num  -0.163 -1.899 0.114 0.708 0.507 ...
 $ culture34D6p62: num  -0.3033 -0.3009 0.0161 0.2411 0.0191 ...
 $ deviance      : num  49.7 46.9 52.7 23.9 64.1 ...
 $ converged     : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 - attr(*, "df.residual")= num 50

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.utf-8        LC_COLLATE=en_US.utf-8    
 [5] LC_MONETARY=en_US.utf-8    LC_MESSAGES=en_US.utf-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] gplots_2.11.0      MASS_7.3-23        KernSmooth_2.23-9  caTools_1.14      
 [5] gdata_2.12.0       gtools_2.7.0       RColorBrewer_1.0-5 multcomp_1.2-17   
 [9] survival_2.37-4    mvtnorm_0.9-9994   heatmap.plus_1.3   DESeq_1.10.1      
[13] lattice_0.20-6     locfit_1.5-8       Biobase_2.18.0     BiocGenerics_0.4.0

loaded via a namespace (and not attached):
 [1] annotate_1.36.0      AnnotationDbi_1.20.6 bitops_1.0-5        
 [4] DBI_0.2-5            genefilter_1.40.0    geneplotter_1.36.0  
 [7] IRanges_1.16.6       parallel_2.15.0      RSQLite_0.11.2      
[10] stats4_2.15.0        tools_2.15.0         XML_3.95-0.2        
[13] xtable_1.7-1        

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806