[BioC] DESeq2, problems using list argument to extract results from DESeq
Michael Love
michaelisaiahlove at gmail.com
Thu Jul 31 15:50:59 CEST 2014
hi Jon,
On Thu, Jul 31, 2014 at 2:13 AM, Jon Bråte <jon.brate at ibv.uio.no> wrote:
> Hi,
>
>> dds$condition
> [1] Early vitellogenesis Early vitellogenesis Early vitellogenesis Late
> vitellogenesis
> [5] Late vitellogenesis Late vitellogenesis Late vitellogenesis
> Fertilization
> [9] Fertilization Early cleavage Early cleavage Late
> cleavage
> [13] Late cleavage Early preinversion Early preinversion Late
> preinversion
> [17] Late preinversion Early postinversion Early postinversion Late
> postinversion
> [21] Non reproductive Non reproductive Non reproductive Non
> reproductive
> [25] Non reproductive Non reproductive
> 10 Levels: Non reproductive Early vitellogenesis Late vitellogenesis
> Fertilization ... Late postinversion
>
>> dds
> class: DESeqDataSet
> dim: 6137 26
> exptData(0):
> assays(3): counts mu cooks
> rownames(6137): scign000005 scign000006 ... scign021663 scign021669
> rowData metadata column names(57): baseMean baseVar ... deviance maxCooks
> colnames(26): 05.apr 11.apr ... R2D0T1 R3D0T1
> colData names(2): condition sizeFactor
>
>
> On 31. juli 2014, at 02:28, Michael Love wrote:
>
> Hi Jon,
>
> On Jul 30, 2014 3:27 PM, "Jon Bråte"
>> thank you for your answer. What I am interested in is to find genes that
>> possibly play a part in embryogenesis. In other words, genes that are
>> up-regulated in any of the embryogenesis stages compared to the
>> non-reproductive stages.
>
> Can you tell us the levels of condition which are embryogenesis vs those
> which are non-reproductive?
>
>>
>> I have tried three different approaches, but struggle to understand what
>> they mean, how they differ, and if they actually make sense. I try to
>> explain what I did, sorry if it is difficult to read:
>>
>>
>> First I tried to contrast all the embryogenesis stages compared to the non
>> reproductive using the Wald test in DESeq:
>>
>> dds = DESeq(dds)
>> resCtrst = results(dds, contrast=list(c("conditionEarly.vitellogenesis",
>> "conditionLate.vitellogenesis", "conditionFertilization",
>> "conditionEarly.cleavage",
>> "conditionLate.cleavage",
>> "conditionEarly.preinversion", "conditionLate.preinversion",
>> "conditionEarly.postinversion",
>> "conditionLate.postinversion"), "conditionNon.reproductive"))
Here you need to include the results() argument listValues=c(1/9, -1).
This is to compare the average of the 9 embryogenic levels to the 1
non-reproductive level. This seems like the correct results table for
your question. Additionally you can perform individual contrasts of
one level with the non-reproductive level:
res = results(dds,
contrast=c("condition","Early.vitellogenesis","Non.reproductive"))
>> resCtrst = resCtrst[order(resCtrst$padj),]
>> resSigCtrst = resCtrst[which(resCtrst$padj < 0.1), ]
>> resSigCtrstPositive = resSigCtrst[which(resSigCtrst$log2FoldChange > 0), ]
>> write.csv2(resSigCtrstPositive,
>> file="DESeq2_Wald_test_Sig_upreg_all_cond_vs_non_repr.csv")
>>
>> This gave me 1736 upregulated genes.
>>
>>
>> Second I ran an LTR test like this:
>>
>> ddsLRT = DESeq(dds, test="LRT", full=~condition, reduced=~1)
>> resLRT = results(ddsLRT)
>> resLRT = resLRT[order(resLRT$padj),]
>> resSigLRT = resLRT[which(resLRT$padj < 0.1), ]
>> resSigLRTPositive = resSigLRT[which(resSigLRT$log2FoldChange > 0), ]
>> write.csv2(resSigLRT, file="DESeq2_Sig_LRT_upreg.csv")
>>
>> This gave me 3158 upregulated genes (all genes have positive log2fold
>> changes so I guess I cannot see which are up or down regulated?)
>>
The log2 fold change which is shown is for just one of the 9
coefficients in the model (the last level over the base level).
In the ?results help file we have:
For analyses using the likelihood ratio test (using ‘nbinomLRT’),
the p-values are determined solely by the difference in deviance
between the full and reduced model formula. A log2 fold change is
included, which can be controlled using the ‘name’ argument, or by
default this will be the estimated coefficient for the last
element of ‘resultsNames(object)’.
>> And I don't understand exactly what is contained in resLRT, are the
>> significant genes here significantly expressed between Non reproductive
>> samples and all others compared to the null hypothesis?
The LRT in this case is testing for any changes across *any* levels of
condition. So this sounds like not the test which you are interested
in answering.
>> Or is it just for
>> the two conditions Late.postinversion vs Non reproductive?:
>>
>> mcols(resLRT)
>> DataFrame with 6 rows and 2 columns
>> type
>> description
>> <character>
>> <character>
>> 1 intermediate the base mean over
>> all
>> rows
>> 2 results log2 fold change: condition Late.postinversion vs
>> Non.reproductive
>> 3 results standard error: condition Late.postinversion vs
>> Non.reproductive
>> 4 results LRT statistic: '~ condition'
>> vs
>> '~ 1'
>> 5 results LRT p-value: '~ condition'
>> vs
>> '~ 1'
>> 6 results BH adjusted
>> p-values
>>
>>
>> Third I tried to compare the Wald test and the LTR:
>>
>> res = results(dds) #should I use resCtrst instead of res??
from the ?results help file:
If ‘results’ is run without specifying
‘contrast’ or ‘name’, it will return the comparison of the last
level of the last variable in the design formula over the first
level of this variable.
Mike
>> tab = table(Wald=res$padj < .1, LRT=resLRT$padj < .1)
>> addmargins(tab)
>>
>> LRT
>> Wald FALSE TRUE Sum
>> FALSE 1027 1766 2793
>> TRUE 526 747 1273
>> Sum 1553 2513 4066
>>
>>
>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> base
>>
>> other attached packages:
>> [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
>> GenomicRanges_1.16.3
>> [5] GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.26.0 Biobase_2.24.0 DBI_0.2-7
>> RColorBrewer_1.0-5 RSQLite_0.11.4
>> [6] XML_3.98-1.1 XVector_0.4.0 annotate_1.42.0
>> genefilter_1.46.1 geneplotter_1.42.0
>> [11] grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1
>> splines_3.1.0 stats4_3.1.0
>> [16] survival_2.37-7 tools_3.1.0 xtable_1.7-3
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> ----------------------------------------------------------------
> Jon Bråte
>
> Section for Genetics and Evolutionary Biology (EVOGENE)
> Department of Biosciences
> University of Oslo
> P.B. 1066 Blindern
> N-0316, Norway
> Email: jon.brate at ibv.uio.no
> Phone: 922 44 582
> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html
>
>
>
>
More information about the Bioconductor
mailing list