[BioC] Interpreting DESeq2 results

Wed Apr 10 21:25:02 CEST 2013

On Mar 28, 2013, at 4:19 PM, Michael Love wrote:

> Hi Michael,
> 
> The baseMean column is not on the log scale; it is the mean of normalized counts for a gene. The intercept from the GLM is labelled intercept in mcols(dse).
> 
> Mike
> 
Hello again

Here's a snippet of output for the "Intercept" term.

> head(res.mm9[["Intercept"]])
DataFrame with 6 rows and 4 columns
                     baseMean log2FoldChange        pvalue           FDR
                    <numeric>      <numeric>     <numeric>     <numeric>
ENSMUSG00000000001 4160.27257      12.107650  0.000000e+00  0.000000e+00
ENSMUSG00000000028  127.54781       7.001754  0.000000e+00  0.000000e+00

and here's a snippet for two level factor

> head(res.mm9[["day14"]])
DataFrame with 6 rows and 4 columns
                     baseMean log2FoldChange       pvalue          FDR
                    <numeric>      <numeric>    <numeric>    <numeric>
ENSMUSG00000000001 4160.27257    -0.06449054 4.578027e-02 1.042132e-01
ENSMUSG00000000028  127.54781    -0.05709357 3.020473e-01 4.500798e-01

I'm still unclear about how to write down the coefficients for the model. The link function is log2(mean), correct? So is the "log2FoldChange" the particular value of beta for that coefficient?

Would I write something like

    y_tilde(ENSMUSG00000000001) = 12.11 - 0.06449 + other terms?

Thanks

Mike

> On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at hudsonalpha.org> wrote:
> Greetings
> 
> I have an experiment:
> 
> > design(dse)
> ~ factor1 + factor2 + factor3
> 
> where factor1 has two levels, factor2 has three levels and factor3 has three levels. I extract a gene of interest from the results for each term (I've changed the indices to reflect the condition):
> 
> > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",])
> [["Intercept"]]
>         baseMean log2FoldChange        pvalue           FDR
> gene_A 1596.548       10.77485 3.309439e-216 7.025442e-216
> [["factor1_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548      0.3386776 0.1307309 0.3587438
> [["factor2_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548     -0.6882543 0.0613569 0.1007896
> [["factor2_level3"]]
>         baseMean log2FoldChange   pvalue       FDR
> gene_A 1596.548      0.2393368 0.513216 0.6589575
> [["factor3_level2"]]
>         baseMean log2FoldChange    pvalue       FDR
> gene_A 1596.548      0.1584153 0.6423634 0.8503163
> [["factor3_level3]]
>         baseMean log2FoldChange       pvalue         FDR
> gene_A 1596.548      -1.627898 1.823141e-06 0.001409384
> 
> I want to be sure I understand the output format. Is it true that the coefficients (the vector beta) from the fit are the baseMean value scaled by the log2FoldChange? Is the true intercept value 1596.548*2^10.77485=2797274.13?
> 
> mcols() tells me that the baseMean term is calculated over "all rows". The baseMean is different for different genes although it is the same for each gene across all the conditions, I'm not seeing how the rows are selected.
> 
> Thanks
> 
> Mike
> 
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
> 
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806