[BioC] DESeq2 : interactions ?
Michael Love
michaelisaiahlove at gmail.com
Thu May 1 14:30:02 CEST 2014
hi Yvan,
It's good to keep replies on the Bioc list, so that other users can
track down answers.
On Thu, May 1, 2014 at 8:21 AM, Yvan Wenger <yvan.wenger at unige.ch> wrote:
> Do you know if there is a simple way to specify that time is a factor?
>
after the line with DESeqDataSetFromMatrix, you can do:
colData(dds_H58)$time <- factor(colData(dds_H58)$time,
levels=unique(colData(dds_H58)$time))
Mike
>
> On Thu, May 1, 2014 at 1:52 PM, Michael Love
> <michaelisaiahlove at gmail.com> wrote:
>> hi Yvan,
>>
>> Are you specifying time as a numeric? If so, then you are not asking
>> for genes which show any change over time specific to condition, but
>> for genes which have consistent fold change by hour specific to
>> condition (i.e. genes with counts which triple every hour, or halve
>> every hour, etc.). Try instead specifying time as a factor.
>>
>> Mike
>>
>> On Thu, May 1, 2014 at 5:57 AM, Yvan Wenger <yvan.wenger at unige.ch> wrote:
>>> Dear list members,
>>>
>>> There must be something I missed. I have an RNA-seq timecourse
>>> experiment with triplicates (see below), there are two conditions: H5
>>> and H8, they can be thought of control/treatment.
>>>
>>> I am trying to assess if some genes react differently over time
>>> between the two conditions. I use the code below with:
>>>
>>> full model: design = ~ time + condition + time:condition
>>> reduced model: reduced = ~ time + condition
>>>
>>> Out of 33'000 genes, I find 732 with interaction between time and
>>> condition (FDR < 0.1). Nice.
>>>
>>> The problem, some genes that I was expecting to appear in this dataset
>>> do not (see the attached pdf for example, I am only concerned with
>>> differences in H5 and H8). Furthermore, when looking at the FDR of
>>> this gene, I can see that it is very far from being significant, the
>>> FDR is 0.97 !
>>>
>>> So I think I am doing something wrong in the way I compare the full
>>> model and the reduce model, probably I am not analyzing the data the
>>> way I want. To recapitulate, what I would like to select are different
>>> behaviours over time between the two conditions.
>>>
>>> Many thanks for your inputs !
>>>
>>> Yvan
>>>
>>>
>>>
>>>
>>> library(DESeq2)
>>>
>>> countData <- read.table("counts_all.txt",row.names=1,header=T)
>>> countData_H58 <- countData[,31:90] ### data from table 1 to 30 are
>>> from another condition not considered here.
>>> colData_H58 <- as.data.frame(read.table(file='conditions_H58')) ###
>>> content of the conditions_H58 file pasted below
>>> dds_H58 <- DESeqDataSetFromMatrix(countData = countData_H58,colData =
>>> colData_H58, design = ~ time + condition + time:condition) ### Full
>>> model
>>> dds_H58 <- estimateSizeFactors(dds_H58)
>>> dds_H58 <- estimateDispersions(dds_H58)
>>> write.table(file="normacounts_H58.csv", counts(dds_H58,normalized=TRUE))
>>> ddsLRT_H58 <- nbinomLRT(dds_H58, reduced = ~ time + condition) ### reduced model
>>> res_H58 <- results(ddsLRT_H58,cooksCutoff=FALSE)
>>> write.table(res_H58,file='results_H58.csv',sep="\t",quote=F)
>>>
>>>
>>>
>>> condition time
>>> 031_H5_000C H5 0
>>> 032_H5_000A H5 0
>>> 033_H5_000B H5 0
>>> 034_H5_005C H5 0.5
>>> 035_H5_005A H5 0.5
>>> 036_H5_005B H5 0.5
>>> 037_H5_010C H5 1
>>> 038_H5_010A H5 1
>>> 039_H5_010B H5 1
>>> 040_H5_020C H5 2
>>> 041_H5_020A H5 2
>>> 042_H5_020B H5 2
>>> 043_H5_040C H5 4
>>> 044_H5_040A H5 4
>>> 045_H5_040B H5 4
>>> 046_H5_080C H5 8
>>> 047_H5_080A H5 8
>>> 048_H5_080B H5 8
>>> 049_H5_160C H5 16
>>> 050_H5_160A H5 16
>>> 051_H5_160B H5 16
>>> 052_H5_240C H5 24
>>> 053_H5_240A H5 24
>>> 054_H5_240B H5 24
>>> 055_H5_360C H5 36
>>> 056_H5_360A H5 36
>>> 057_H5_360A H5 36
>>> 058_H5_480B H5 48
>>> 059_H5_480C H5 48
>>> 060_H5_480A H5 48
>>> 061_H8_000B H8 0
>>> 062_H8_000C H8 0
>>> 063_H8_000A H8 0
>>> 064_H8_005B H8 0.5
>>> 065_H8_005C H8 0.5
>>> 066_H8_005A H8 0.5
>>> 067_H8_010B H8 1
>>> 068_H8_010C H8 1
>>> 069_H8_010A H8 1
>>> 070_H8_020B H8 2
>>> 071_H8_020C H8 2
>>> 072_H8_020A H8 2
>>> 073_H8_040B H8 4
>>> 074_H8_040C H8 4
>>> 075_H8_040A H8 4
>>> 076_H8_080B H8 8
>>> 077_H8_080C H8 8
>>> 078_H8_080A H8 8
>>> 079_H8_160B H8 16
>>> 080_H8_160C H8 16
>>> 081_H8_160A H8 16
>>> 082_H8_240B H8 24
>>> 083_H8_240C H8 24
>>> 084_H8_240A H8 24
>>> 085_H8_360A H8 36
>>> 086_H8_360B H8 36
>>> 087_H8_360C H8 36
>>> 088_H8_480A H8 48
>>> 089_H8_480B H8 48
>>> 090_H8_480C H8 48
>>>
>>>
>>>
>>>
>>>> sessionInfo()
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] DESeq2_1.4.0 RcppArmadillo_0.4.100.2.1
>>> [3] Rcpp_0.11.1 GenomicRanges_1.16.1
>>> [5] GenomeInfoDb_1.0.2 IRanges_1.22.3
>>> [7] BiocGenerics_0.10.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0
>>> [4] DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0
>>> [7] grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1
>>> [10] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0
>>> [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0
>>> [16] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list