[BioC] DESeq vs DESeq2 have different DEGs results

Tue May 20 14:12:43 CEST 2014

hi Catalina,

On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado
<catagui at gmail.com> wrote:
> Hi Mike,
>
> Not sure what you mean by the intersection.

I meant those genes which had padj < cutoff for both DESeq and DESeq2.

You can calculate this with

length(intersect(padj1rownames, padj2rownames))

It's safest to work with the rownames, to make sure you are referring
to the same genes.

You can get this vector, e.g. in DESeq2 with:

padj2rownames <- rownames(results[which(results$padj < cutoff),])

> I have run both without any
> filter and reduced formula of ~Subject. As you can see below, there is
> almost no DEGs and the only way I could get was when ~1.
>

This means that, without independent filtering, for most genes, the
treatment effect was too small to observe a significant difference for
the experiment's sample size and/or sequencing depth. However, you
should use independent filtering to report as many genes as possible,
and I'd recommend using DESeq2 which will do this automatically within
the results() function.

The reason I asked for you to not use independent filtering in both is
to see if in fact there were many genes called by either version of
DESeq but not by both, when the settings were made comparable.

Note that the ~ 1 reduced design is not what you want:

OK: "we tested for the treatment effect, controlling for the subject effect"
full: ~ subject + treament
reduced: ~ subject

not ok: "we tested for the treatment and subject effect"
full: ~ subject + treament
reduced: ~ 1

Mike

> DESeq2:
>
>> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject)
>> resLRT <- results(dds, independentFiltering=FALSE)
>
>> sum( resLRT$padj < 0.5, na.rm=TRUE )
> [1] 6
>
>
> DESeq:
>
>
>>fit0 <- fitNbinomGLMs (cds, count~Subject)
>
>>fit1 <- fitNbinomGLMs (cds, count~Treatment)
>
>>pvals <- nbinomGLMTest (fit1, fit0)
>
>> padj <- p.adjust( pvals, method="BH" )
>> padj <- data.frame(padj)
>
>> row.names(padj)=row.names(cds)
>> padj_fil <- subset (padj,padj <0.05 )
>> dim (padj_fil)
> [1] 0 1
>
> Thanks,
>
> Cata
>
> On Mon, May 19, 2014 at 10:31 PM, Michael Love
> <michaelisaiahlove at gmail.com>wrote:
>
>> Hi Catalina,
>>
>> What are the numbers exactly and the intersection for using a reduced
>> design of ~ subject for both, and not filtering in either case
>> (independentFiltering = FALSE)?
>>
>> Mike
>> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <catagui at gmail.com>
>> wrote:
>>
>>> Hi Mike,
>>>
>>> Thanks for your reply.
>>>
>>> I had read the manual and did try the reduced design ~Subject and got 5
>>> DEGs. I tried with both DESeq and DESeq2.
>>>
>>> My data has a stronger effect of the subject compare to the treatment, if
>>> I
>>> do a heatmap the samples group by subject rather than by treatment.
>>>
>>> The filtering theta value was choose based on a plot of "Number of DEGs vs
>>> theta" when using filtered_R in genefilter. Thanks for pointing out that
>>> results() was doing automatically. I have just run it again without
>>> pre-filter and got 812 DEGs.
>>>
>>> Still my concern is why the DEGs that I get in DESeq disappear in the
>>> DESeq2 results?
>>>
>>> Thanks again,
>>>
>>> Catalina
>>>
>>>
>>> On Fri, May 16, 2014 at 10:35 PM, Michael Love
>>> <michaelisaiahlove at gmail.com>wrote:
>>>
>>> > hi Catalina,
>>> >
>>> > If you want to test for the treatment effect and control the subject
>>> > effect you should use the reduced design of ~ subject.
>>> >
>>> > In the DESeq2 vignette we say: "the user provides the full formula
>>> > (the formula stored in design(dds)), and a reduced formula, e.g. one
>>> > which does not contain the variable of interest."
>>> >
>>> > For your filtering for the old version of DESeq, how did you chose the
>>> > value of theta=0.2? Note that such filtering in DESeq2 is accomplished
>>> > automatically by results(), and the function optimizes for the best
>>> > value of theta (most adjusted p-values less than a threshold). So you
>>> > do not need to pre-filter.
>>> >
>>> > Mike
>>> >
>>> >
>>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado
>>> > <catagui at gmail.com> wrote:
>>> > > Hi I want to compare DESeq vs DESeq2 and I am getting different
>>> number of
>>> > > DEGs which I will expect to be normal. But when I compare the 149
>>> genes
>>> > iD
>>> > > that I get with DESeq with the 869 from DESeq2 there are only ~10
>>> genes
>>> > > that are in common which I don\'92t understand  (using FDR <0.05 for
>>> > both).
>>> > > I want to block the Subject effect for which I am including the
>>> reduced
>>> > > formula of ~1.
>>> > >
>>> > > Shouldn\'92t this two methods output similar results?  Because at the
>>> > > moment I could interpret my results with different ways.\
>>> > > \
>>> > > Thanks for your help,\
>>> > > \
>>> > > Catalina\
>>> > > \
>>> > > \
>>> > > This the DESeq script that I am using:\
>>> > > \
>>> > > \
>>> > > DESeq\
>>> > > \
>>> > > library(DESeq)\
>>> > > \
>>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",",
>>> > > row.names=1))\
>>> > > \
>>> > > \
>>> > > Subject=c(1,2,3,4,5,1,2,4,5)\
>>> > > \
>>> > > Treatment=c(rep("co",5),rep("c2",4))\
>>> > > a.con=cbind(Subject,Treatment)\
>>> > > \
>>> > > cds=newCountDataSet(co,a.con)\
>>> > > \
>>> > > \
>>> > > cds <- estimateSizeFactors( cds)\
>>> > > \
>>> > > cds <- estimateDispersions(cds,method="pooled-CR",
>>> > > modelFormula=count~Subject+Treatment)\
>>> > > \
>>> > > \
>>> > > #filtering\
>>> > > \
>>> > > rs = rowSums ( counts ( cds ))\
>>> > > theta = 0.2\
>>> > > use = (rs > quantile(rs, probs=theta))\
>>> > > table(use)\
>>> > > cdsFilt= cds[ use, ]\
>>> > > \
>>> > > \
>>> > > \
>>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\
>>> > > \
>>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\
>>> > > \
>>> > > pvals <- nbinomGLMTest (fit1, fit0)\
>>> > > \
>>> > > \
>>> > > padj <- p.adjust( pvals, method="BH" )\
>>> > > \
>>> > > padj <- data.frame(padj)\
>>> > > \
>>> > > row.names(padj)=row.names(cdsFilt)\
>>> > > \
>>> > > padj_fil <- subset (padj,padj <0.05 )\
>>> > > \
>>> > > dim (padj_fil)\
>>> > > \
>>> > > [1] 149   1\
>>> > > \
>>> > > \
>>> > > ------------
>>> > > \
>>> > > library ("DESeq2")\
>>> > > \
>>> > > countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T,
>>> sep=",",
>>> > > row.names=1))\
>>> > > \
>>> > > coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\
>>> > > \
>>> > > coldata\
>>> > > \
>>> > >>Subject Treatment\
>>> > >>F1       1        co\
>>> > >>F2       2        co\
>>> > >>F3       3        co\
>>> > >>F4       4        co\
>>> > >>F5       5        co\
>>> > >>H1       1        c2\
>>> > >>H2       2        c2\
>>> > >>H4       4        c2\
>>> > >>H5       5        c2\
>>> > > \
>>> > > dds <- DESeqDataSetFromMatrix(\
>>> > >   countData = countdata,\
>>> > >   colData = coldata,\
>>> > >   design = ~ Subject + Treatment)\
>>> > > dds\
>>> > > \
>>> > > dds$Treatment <- relevel (dds$Treatment, "co")\
>>> > > \
>>> > > \
>>> > > dds <- estimateSizeFactors( dds)\
>>> > > \
>>> > > dds <- estimateDispersions(dds)\
>>> > > \
>>> > > \
>>> > > rs = rowSums ( counts ( dds ))\
>>> > > theta = 0.2\
>>> > > use = (rs > quantile(rs, probs=theta))\
>>> > > table(use)\
>>> > > ddsFilt= dds[ use, ]\
>>> > > \
>>> > > \
>>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\
>>> > > \
>>> > > resLRT <- results(dds)\
>>> > > \
>>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\
>>> > > \
>>> > > #[1] 869}
>>> > >
>>> > >         [[alternative HTML version deleted]]
>>> > >
>>> > > _______________________________________________
>>> > > Bioconductor mailing list
>>> > > Bioconductor at r-project.org
>>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > > Search the archives:
>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >
>>>
>>>
>>>
>>> --
>>> *Catalina Aguilar Hurtado*
>>>
>>>  PhD Candidate
>>> ARC Centre of Excellence for Coral Reef Studies
>>> Department of Biochemistry and Molecular Biology
>>> AIMS at JCU
>>> James Cook University
>>> Townsville, QLD 4811, Australia
>>> ph: +61 7 4781 6009
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor