[BioC] Multifactor analysis with deseq
Michael Muratet
mmuratet at hudsonalpha.org
Fri Mar 22 20:52:03 CET 2013
Dear Dr Anders
I have a few questions about applying deseq to a multifactor analysis that I am doing. I have three factors: three levels of genotype, three levels of "cell culture" and two time points. I have three replicates in each cell. The comparison(s) of most interest are changes in expression with genotype and days. I am also interested in knowing the effects of cell culture, but it's a technical issue.
I have used HTSeq-count with the ENSEMBL v54 H.sapiens gtf to get counts from reads aligned with tophat. I have 54 libraries total. The first few rows of the conditions file look like:
day culture genotype
SL27450 7 33D6p63 NT
SL27451 7 33D6p63 NT
SL27452 7 33D6p63 NT
SL27456 14 33D6p63 NT
SL27457 14 33D6p63 NT
SL27458 14 33D6p63 NT
such that each set of factors is repeated three times.
After normalization and dispersion estimation I did:
fit0 <- fitNbinomGLMs(cds, count~genotype+day+culture)
fit1 <- fitNbinomGLMs(cds, count~day+culture)
then
pvalsGLM = nbinomGLMTest( fit1, fit0 )
The result was entirely NaN or NAs. I'll put some output and the environment at the end of this email. I'm not sure where to start looking for the problem.
At the moment, I am wondering how best to implement the pairwise comparison alluded to in the "fix me" in the manual I have. I am wondering:
Does the formula in the fitNbinomGLMs() function work the same way as lm(), e.g., do I have to explicitly include interactions or will it calculate them for me?
Is there a way to use relevel() to set the reference level? I have a wild-type genotype that would be natural for comparing two mutant genotypes.
I don't see any references on the list, but has anybody tried to use normalized values from a CountDataSet outside of deseq?
Lastly, I saw on the list while researching this issue that deseq2 is out. I'll give it try.
Here is str() output from the fits (which seems OK to me)
> str(fit0)
'data.frame': 37435 obs. of 8 variables:
$ (Intercept) : num 9.01 1.8 7.69 7.06 6.63 ...
$ genotypeNT : num -0.0666 0.0223 -0.08 -0.1554 -0.1967 ...
$ genotypeWT : num -0.1665 -0.6294 -0.0716 -0.0908 -0.0754 ...
$ day7 : num -0.0849 -0.4425 -0.0596 -0.1809 -0.2374 ...
$ culture33D6p63: num -0.153 -1.917 0.119 0.715 0.518 ...
$ culture34D6p62: num -0.2925 -0.3422 0.0181 0.242 0.0233 ...
$ deviance : num 46.1 44.6 51.8 21.1 60.2 ...
$ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
- attr(*, "df.residual")= num 48
> str(fit1)
'data.frame': 37435 obs. of 6 variables:
$ (Intercept) : num 8.94 1.6 7.64 6.99 6.54 ...
$ day7 : num -0.0826 -0.4417 -0.0589 -0.1793 -0.2361 ...
$ culture33D6p63: num -0.163 -1.899 0.114 0.708 0.507 ...
$ culture34D6p62: num -0.3033 -0.3009 0.0161 0.2411 0.0191 ...
$ deviance : num 49.7 46.9 52.7 23.9 64.1 ...
$ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
- attr(*, "df.residual")= num 50
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf-8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf-8 LC_COLLATE=en_US.utf-8
[5] LC_MONETARY=en_US.utf-8 LC_MESSAGES=en_US.utf-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid splines stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] gplots_2.11.0 MASS_7.3-23 KernSmooth_2.23-9 caTools_1.14
[5] gdata_2.12.0 gtools_2.7.0 RColorBrewer_1.0-5 multcomp_1.2-17
[9] survival_2.37-4 mvtnorm_0.9-9994 heatmap.plus_1.3 DESeq_1.10.1
[13] lattice_0.20-6 locfit_1.5-8 Biobase_2.18.0 BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] annotate_1.36.0 AnnotationDbi_1.20.6 bitops_1.0-5
[4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0
[7] IRanges_1.16.6 parallel_2.15.0 RSQLite_0.11.2
[10] stats4_2.15.0 tools_2.15.0 XML_3.95-0.2
[13] xtable_1.7-1
Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)
Room 4005
601 Genome Way
Huntsville, Alabama 35806
More information about the Bioconductor
mailing list