> Hi Mike,
> I am writing to follow up the multiple comparison using contrasts.
> To remind you again, here’s my experimental design:
> Genotypes: 4 different genotypes
> Timepoint: 3 different timepoints (6h, 12h, and 24h)
> Temperature: Low and high temperatures
> 3 biological replicates for each condition.
> In the previous post, you suggested the following design:
> ~ genotype + time + temp + genotype:temp + time:temp
> And then call
> resultsNames(dds)
>
> In order to see all the interactions which are available for generating
tests. For example:
> results(dds, name="genotypeA:tempHi")
>
> ...will provide you with the results of a test of whether the high
temperature vs the low temperature has a specific effect for genotype A,
over all time points.
> My questions are:
>
> 1. From your suggestions, to make sure we are on the same page, did
you recommend me using all gene counts from all genotypes at all timepoints
and temperatures for differential expression step (dds) before calling
results (dds) for comparisons of interest?
Yes, I was recommending you run with all samples in the dataset object.
> Or, should I just pull out gene counts of
genotypes/timepoints/temperature I am interested in for the dds step before
calling results (dds) for comparisons of interest?
> What I have done is
> i) using gene counts from all samples for differential expression
step (dds) before calling results(dds); see output#3
> ii) using gene counts from a subset of my samples for differential
expression step (dds) before calling results(dds);see output#2
> When I tried performing differential expression using gene counts from
all samples (at all timepoints and temperatures), I received these warning
messages from R (please see output #1 in the box below).
>
This warning means that the parametric trend for dispersion is not
appropriate for your data. I would run DESeq() with the argument,
fitType="mean".
> On a different note, when I tried using only the gene counts of a subset
of samples I wanted to compare, DESeq2(version 1.2.6) automatically
determined the types of comparisons I could make in the resultsNames(dds).
For example,
>
> >resultsNames(dds2)
> "Intercept" "temp_temp1_vs_temp2" "time_24h_vs_12h" "time_6h_vs_12h"
"temp2.time24h" "temp2.time6h"
>
> How are all these comparisons pre-determined?
These are determined by the R function model.matrix() using the levels of
the factors in the colData of the subsetted dataset object. Temp 1 and Time
12h are the base levels.of these factors. You should specify which levels
you want as the base levels before running DESeq().
> When I called results(dds), does it compare effect of tempHi versus
tempLow on genotype A, over all time points? Please see the output #3 for
reference.
No over " GenotypeB" (note the space) because this is alphabetically before
genotypeA. We make it very clear in the vignette about the importance of
setting the base level of factors. If you set "A" as base level, it would
be as you said.
> Finally, the results of differentially expressed genes for i) and ii) are
different. So, I’d like to make sure which step I should be doing and if
there is anything wrong with my R-command lines.
>
> Yoong
> -- output of sessionInfo():
> Output#1
> > dds1= DESeq(dds,betaPrior=FALSE)
> estimating size factors
> estimating dispersions
> gene-wise dispersion estimates
> mean-dispersion relationship
> final dispersion estimates
> fitting model and testing
> There were 12 warnings (use warnings() to see them)
> > warnings()
> Warning messages:
> 1: glm.fit: algorithm did not converge
> 2: glm.fit: algorithm did not converge
> 3: glm.fit: algorithm did not converge
> 4: glm.fit: algorithm did not converge
> 5: glm.fit: algorithm did not converge
> 6: glm.fit: algorithm did not converge
> 7: glm.fit: algorithm did not converge
> 8: glm.fit: algorithm did not converge
> 9: glm.fit: algorithm did not converge
> 10: glm.fit: algorithm did not converge
> 11: glm.fit: algorithm did not converge
> 12: In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], ... :
> dispersion fit did not converge
> Output#2:
> >dds1 = DESeqDataSetFromMatrix(countData = GenotypeA, colData = colData,
design = ~temp+time+time:temp)
> >dds2= DESeq(dds1,betaPrior=FALSE)
> >resultsNames(dds2)
>
> "Intercept" "temp_TempHi_vs_TempLow" "time_24h_vs_12h" "time_6h_vs_12h"
"TempLow.time24h" "TempHi.time6h"
>
> >results(dds2,name="temp_TempHi_vs_TempLow")
>
> Output#3:
>
> >dds3 = DESeqDataSetFromMatrix(countData = allData, colData = colData,
design = ~genotype+time+temp+genotype:temp+ time:temp)
> >dds4= DESeq(dds3,betaPrior=FALSE)
> WARNINGS
>
> > resultsNames(dds4)
> [1] "Intercept" "genotypeC_vs_ GenotypeB " "genotypeA_vs_
GenotypeB"
> [4] "genotype_GenotypeD_vs_GenotypeB" "time_24h_vs_12h"
"time_6h_vs_12h"
> [7] "temp_TempHi_vs_TempLow" "genotypeC.TempHi" "genotypeA.TempHi"
> [10] "genotypeD.TempHi" "time24h.TempHi" "time6h.TempHi"
> >results(dds4,name="temp_TempHi_vs_TempLow")
