[BioC] DESeq vs DESeq2 have different DEGs results

Michael Love michaelisaiahlove at gmail.com
Wed May 21 15:36:39 CEST 2014


hi Catalina,

Simon addressed the question about models, but I want to follow up on
another point.

On Tue, May 20, 2014 at 9:03 PM, Catalina Aguilar Hurtado
<catagui at gmail.com> wrote:
>
> Otherwise it would mean that my whole experiment didn’t work and can not
> get any data out of it.
>

in the sense that it was underpowered to detect changes from the 5
control to the 4 treated samples at an FDR cutoff of 0.05. Note that
1/20 is an arbitrary cutoff. You might also consider valuable to
create lists of DE genes with 1/5 false discoveries for follow up
study, although it might be the case that there are few genes at this
threshold as well. Furthermore, you can still use the experiment for
exploratory analysis. For instance, examining the MA plot will give
you insights into how many genes had very high or low fold changes.
You can rank the results table by absolute value of log2FoldChange,
and prioritize these genes for follow up study.

Mike


> Thanks again,
>
> Catalina
>
>
> On Tue, May 20, 2014 at 10:12 PM, Michael Love
> <michaelisaiahlove at gmail.com>wrote:
>
>> hi Catalina,
>>
>>
>> On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado
>> <catagui at gmail.com> wrote:
>> > Hi Mike,
>> >
>> > Not sure what you mean by the intersection.
>>
>> I meant those genes which had padj < cutoff for both DESeq and DESeq2.
>>
>> You can calculate this with
>>
>> length(intersect(padj1rownames, padj2rownames))
>>
>> It's safest to work with the rownames, to make sure you are referring
>> to the same genes.
>>
>> You can get this vector, e.g. in DESeq2 with:
>>
>> padj2rownames <- rownames(results[which(results$padj < cutoff),])
>>
>> > I have run both without any
>> > filter and reduced formula of ~Subject. As you can see below, there is
>> > almost no DEGs and the only way I could get was when ~1.
>> >
>>
>> This means that, without independent filtering, for most genes, the
>> treatment effect was too small to observe a significant difference for
>> the experiment's sample size and/or sequencing depth. However, you
>> should use independent filtering to report as many genes as possible,
>> and I'd recommend using DESeq2 which will do this automatically within
>> the results() function.
>>
>> The reason I asked for you to not use independent filtering in both is
>> to see if in fact there were many genes called by either version of
>> DESeq but not by both, when the settings were made comparable.
>>
>> Note that the ~ 1 reduced design is not what you want:
>>
>> OK: "we tested for the treatment effect, controlling for the subject
>> effect"
>> full: ~ subject + treament
>> reduced: ~ subject
>>
>>
>> not ok: "we tested for the treatment and subject effect"
>> full: ~ subject + treament
>> reduced: ~ 1
>>
>>
>> Mike
>>
>>
>> > DESeq2:
>> >
>> >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject)
>> >> resLRT <- results(dds, independentFiltering=FALSE)
>> >
>> >> sum( resLRT$padj < 0.5, na.rm=TRUE )
>> > [1] 6
>> >
>> >
>> > DESeq:
>> >
>> >
>> >>fit0 <- fitNbinomGLMs (cds, count~Subject)
>> >
>> >>fit1 <- fitNbinomGLMs (cds, count~Treatment)
>> >
>> >>pvals <- nbinomGLMTest (fit1, fit0)
>> >
>> >> padj <- p.adjust( pvals, method="BH" )
>> >> padj <- data.frame(padj)
>> >
>> >> row.names(padj)=row.names(cds)
>> >> padj_fil <- subset (padj,padj <0.05 )
>> >> dim (padj_fil)
>> > [1] 0 1
>> >
>> > Thanks,
>> >
>> > Cata
>> >
>> > On Mon, May 19, 2014 at 10:31 PM, Michael Love
>> > <michaelisaiahlove at gmail.com>wrote:
>> >
>> >> Hi Catalina,
>> >>
>> >> What are the numbers exactly and the intersection for using a reduced
>> >> design of ~ subject for both, and not filtering in either case
>> >> (independentFiltering = FALSE)?
>> >>
>> >> Mike
>> >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui at gmail.com
>> >
>> >> wrote:
>> >>
>> >>> Hi Mike,
>> >>>
>> >>> Thanks for your reply.
>> >>>
>> >>> I had read the manual and did try the reduced design ~Subject and got 5
>> >>> DEGs. I tried with both DESeq and DESeq2.
>> >>>
>> >>> My data has a stronger effect of the subject compare to the treatment,
>> if
>> >>> I
>> >>> do a heatmap the samples group by subject rather than by treatment.
>> >>>
>> >>> The filtering theta value was choose based on a plot of "Number of
>> DEGs vs
>> >>> theta" when using filtered_R in genefilter. Thanks for pointing out
>> that
>> >>> results() was doing automatically. I have just run it again without
>> >>> pre-filter and got 812 DEGs.
>> >>>
>> >>> Still my concern is why the DEGs that I get in DESeq disappear in the
>> >>> DESeq2 results?
>> >>>
>> >>> Thanks again,
>> >>>
>> >>> Catalina
>> >>>
>> >>>
>> >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love
>> >>> <michaelisaiahlove at gmail.com>wrote:
>> >>>
>> >>> > hi Catalina,
>> >>> >
>> >>> > If you want to test for the treatment effect and control the subject
>> >>> > effect you should use the reduced design of ~ subject.
>> >>> >
>> >>> > In the DESeq2 vignette we say: "the user provides the full formula
>> >>> > (the formula stored in design(dds)), and a reduced formula, e.g. one
>> >>> > which does not contain the variable of interest."
>> >>> >
>> >>> > For your filtering for the old version of DESeq, how did you chose
>> the
>> >>> > value of theta=0.2? Note that such filtering in DESeq2 is
>> accomplished
>> >>> > automatically by results(), and the function optimizes for the best
>> >>> > value of theta (most adjusted p-values less than a threshold). So you
>> >>> > do not need to pre-filter.
>> >>> >
>> >>> > Mike
>> >>> >
>> >>> >
>> >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado
>> >>> > <catagui at gmail.com> wrote:
>> >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different
>> >>> number of
>> >>> > > DEGs which I will expect to be normal. But when I compare the 149
>> >>> genes
>> >>> > iD
>> >>> > > that I get with DESeq with the 869 from DESeq2 there are only ~10
>> >>> genes
>> >>> > > that are in common which I don\'92t understand  (using FDR <0.05
>> for
>> >>> > both).
>> >>> > > I want to block the Subject effect for which I am including the
>> >>> reduced
>> >>> > > formula of ~1.
>> >>> > >
>> >>> > > Shouldn\'92t this two methods output similar results?  Because at
>> the
>> >>> > > moment I could interpret my results with different ways.\
>> >>> > > \
>> >>> > > Thanks for your help,\
>> >>> > > \
>> >>> > > Catalina\
>> >>> > > \
>> >>> > > \
>> >>> > > This the DESeq script that I am using:\
>> >>> > > \
>> >>> > > \
>> >>> > > DESeq\
>> >>> > > \
>> >>> > > library(DESeq)\
>> >>> > > \
>> >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",",
>> >>> > > row.names=1))\
>> >>> > > \
>> >>> > > \
>> >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\
>> >>> > > \
>> >>> > > Treatment=c(rep("co",5),rep("c2",4))\
>> >>> > > a.con=cbind(Subject,Treatment)\
>> >>> > > \
>> >>> > > cds=newCountDataSet(co,a.con)\
>> >>> > > \
>> >>> > > \
>> >>> > > cds <- estimateSizeFactors( cds)\
>> >>> > > \
>> >>> > > cds <- estimateDispersions(cds,method="pooled-CR",
>> >>> > > modelFormula=count~Subject+Treatment)\
>> >>> > > \
>> >>> > > \
>> >>> > > #filtering\
>> >>> > > \
>> >>> > > rs = rowSums ( counts ( cds ))\
>> >>> > > theta = 0.2\
>> >>> > > use = (rs > quantile(rs, probs=theta))\
>> >>> > > table(use)\
>> >>> > > cdsFilt= cds[ use, ]\
>> >>> > > \
>> >>> > > \
>> >>> > > \
>> >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\
>> >>> > > \
>> >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\
>> >>> > > \
>> >>> > > pvals <- nbinomGLMTest (fit1, fit0)\
>> >>> > > \
>> >>> > > \
>> >>> > > padj <- p.adjust( pvals, method="BH" )\
>> >>> > > \
>> >>> > > padj <- data.frame(padj)\
>> >>> > > \
>> >>> > > row.names(padj)=row.names(cdsFilt)\
>> >>> > > \
>> >>> > > padj_fil <- subset (padj,padj <0.05 )\
>> >>> > > \
>> >>> > > dim (padj_fil)\
>> >>> > > \
>> >>> > > [1] 149   1\
>> >>> > > \
>> >>> > > \
>> >>> > > ------------
>> >>> > > \
>> >>> > > library ("DESeq2")\
>> >>> > > \
>> >>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T,
>> >>> sep=",",
>> >>> > > row.names=1))\
>> >>> > > \
>> >>> > > coldata= read.table ("targets.csv", header = T,
>> sep=",",row.names=1)\
>> >>> > > \
>> >>> > > coldata\
>> >>> > > \
>> >>> > >>Subject Treatment\
>> >>> > >>F1       1        co\
>> >>> > >>F2       2        co\
>> >>> > >>F3       3        co\
>> >>> > >>F4       4        co\
>> >>> > >>F5       5        co\
>> >>> > >>H1       1        c2\
>> >>> > >>H2       2        c2\
>> >>> > >>H4       4        c2\
>> >>> > >>H5       5        c2\
>> >>> > > \
>> >>> > > dds <- DESeqDataSetFromMatrix(\
>> >>> > >   countData = countdata,\
>> >>> > >   colData = coldata,\
>> >>> > >   design = ~ Subject + Treatment)\
>> >>> > > dds\
>> >>> > > \
>> >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\
>> >>> > > \
>> >>> > > \
>> >>> > > dds <- estimateSizeFactors( dds)\
>> >>> > > \
>> >>> > > dds <- estimateDispersions(dds)\
>> >>> > > \
>> >>> > > \
>> >>> > > rs = rowSums ( counts ( dds ))\
>> >>> > > theta = 0.2\
>> >>> > > use = (rs > quantile(rs, probs=theta))\
>> >>> > > table(use)\
>> >>> > > ddsFilt= dds[ use, ]\
>> >>> > > \
>> >>> > > \
>> >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\
>> >>> > > \
>> >>> > > resLRT <- results(dds)\
>> >>> > > \
>> >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\
>> >>> > > \
>> >>> > > #[1] 869}
>> >>> > >
>> >>> > >         [[alternative HTML version deleted]]
>> >>> > >
>> >>> > > _______________________________________________
>> >>> > > Bioconductor mailing list
>> >>> > > Bioconductor at r-project.org
>> >>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>> > > Search the archives:
>> >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> *Catalina Aguilar Hurtado*
>> >>>
>> >>>  PhD Candidate
>> >>> ARC Centre of Excellence for Coral Reef Studies
>> >>> Department of Biochemistry and Molecular Biology
>> >>> AIMS at JCU
>> >>> James Cook University
>> >>> Townsville, QLD 4811, Australia
>> >>> ph: +61 7 4781 6009
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> _______________________________________________
>> >>> Bioconductor mailing list
>> >>> Bioconductor at r-project.org
>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>> Search the archives:
>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list