[Bioc-devel] R: Re: Re: Deseq2 and differentia expression

Michael Love michaelisaiahlove at gmail.com
Fri Jul 11 17:07:44 CEST 2014


hi Jarod,

Please take a look at the beginner vignette for DESeq2. We explain a
lot of the questions you are asking, including how to subset the
object based on adjusted p-value. Please take a look a my previous
email as well, I demonstrated how to subset the results object. As you
can see in the vignette R output, the rownames of the object returned
by results inherits the rownames of dds. So this gives the gene names:

rownames(res)

Mike

On Fri, Jul 11, 2014 at 10:55 AM, jarod_v6 at libero.it <jarod_v6 at libero.it> wrote:
> Dear All I don't understand. I want to extract only genes have a fold-change
> more than 0.5
> I use this comand but I have all the genes inside:
>  resGA2 <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") #greater
> dim(resGA)
>>> [1] 62893     6
>>>> dim(resGA2)
>>> [1] 62893     6
>>>> dim(resGA3)
>>> [1] 62893     6
>>>
> how can extract the names of genes are in resGA2?
> thanks for the patience!
> j.
>
>
>>----Messaggio originale----
>>Da: michaelisaiahlove at gmail.com
>>Data: 11/07/2014 15.15
>>A: "jarod_v6 at libero.it"<jarod_v6 at libero.it>
>>Cc: "bioc-devel at r-project.org"<bioc-devel at r-project.org>
>>Ogg: Re: Re: [Bioc-devel] Deseq2 and differentia expression
>>
>>hi Jarod,
>>
>>This is more of a main Bioc mailing list question, so you can address
>>future questions there.
>>
>>On Fri, Jul 11, 2014 at 6:05 AM, jarod_v6 at libero.it <jarod_v6 at libero.it>
> wrote:
>>> Dear Dr,
>>> Thanks so much for clarification!!!
>>> So I try the test of log fold change but I'm bit confusion on the results:
>>> If I interested in the genes that have a foldchange more than 0.5 and 2 I
> need
>>> to use this comand is it right?
>>
>>the second and third results() commands below give you this.
>>
>>> ddsNoPrior <- DESeq(ddHTSeq, betaPrior=FALSE) #only for lessABs
>>>
>>> resGA <- results(ddsNoPrior, lfcThreshold=.5, altHypothesis="lessAbs")
>>> #greater tdi
>>> resGA2 <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs")
> #greater
>>> tdi
>>> resGA3 <- results(dds, lfcThreshold=2, altHypothesis="greaterAbs") #greater
>>> tdi
>>>
>>> dim(resGA)
>>> [1] 62893     6
>>>> dim(resGA2)
>>> [1] 62893     6
>>>> dim(resGA3)
>>> [1] 62893     6
>>>
>>> The number of gene select it is always the same.. Where is my mistake!
>>> thanks in advance!
>>>
>>
>>DESeq2 returns the results for all the genes in the same order as the
>>original object. You need to specify a threshold on adjusted p-value.
>>
>>table(res$padj < 0.1)
>>
>>You can use subset(res, padj < 0.1) to filter the DataFrame.
>>
>>>
>>>>----Messaggio originale----
>>>>Da: michaelisaiahlove at gmail.com
>>>>Data: 10/07/2014 14.46
>>>>A: "jarod_v6 at libero.it"<jarod_v6 at libero.it>
>>>>Cc: "bioc-devel at r-project.org"<bioc-devel at r-project.org>
>>>>Ogg: Re: [Bioc-devel] Deseq2 and differentia expression
>>>>
>>>>hi Jarod,
>>>>
>>>>On Thu, Jul 10, 2014 at 7:59 AM, jarod_v6 at libero.it <jarod_v6 at libero.it>
>>> wrote:
>>>>> Hi there!!!
>>>>>
>>>>> I have did this code:
>>>>> SampleTable <-data.frame(SampleName=metadata$ID_CLINICO,
>>> fileName=metadata$NOME,
>>>>> condition=metadata$CONDITION,prim=metadata$CDT)
>>>>> ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="
>>>>> Count/", design= ~condition) # effetto dello mutazione
>>>>> ddHTSeq$condition <- relevel(ddHTSeq$condition, "NVI")# quindi verso non
>>>>> viscerali
>>>>> dds <- DESeq(ddHTSeq)
>>>>> res <-results(dds)
>>>>>
>>>>> resOrdered <- res[order(res$padj),]
>>>>> head(resOrdered)
>>>>> ResSig <- res[ which(res$padj < 0.1 ), ]
>>>>>
>>>>>
>>>>> I want to select some data. How can I do? which is the good cut-off on
> FDR
>>>>> values?
>>>>
>>>>The code above does the selection on adjusted p-value. The right FDR
>>>>cutoff is up to you, what percent of false discoveries is tolerable in
>>>>the final list of genes? The considerations are: the cost of
>>>>validation or following up on a false discovery, versus the cost of a
>>>>missed discovery. These are hard to quantify even if you know all the
>>>>details of an experiment.
>>>>
>>>>> All the data have a FDR less thank 0.1 . :
>>>>> Is it right this comand?
>>>>> res[ which(res$padj < 0.1 ), ]
>>>>>
>>>>
>>>>yes. The which() is necessary because some of the res$padj have NA. If
>>>>you have a logical vector with NA, you cannot directly index a
>>>>DataFrame, but you can index after calling which(), which will return
>>>>the numeric index of the TRUE's. You could also subset with:
>>>>subset(res, padj < 0.1).
>>>>
>>>>The reason for the NAs is explained in the vignette: "Note that some
>>>>values in the results table can be set to NA, for either one of the
>>>>following reasons:..."
>>>>
>>>>
>>>>> How many significant genes are with FDR less than 0.1 and  have an
> absolute
>>>>> value of  foldchange  more of 1 ? I  have and error on this. I have many
> NA
>>>>> values.
>>>>>
>>>>> If I try this code I have the follow errors
>>>>>> significant.genes = res[(res$padj < .05 & abs(res$log2FoldChange) >= 1
> ),]
>>> #
>>>>> Set thethreshold for the log2 fold change.
>>>>> Error in normalizeSingleBracketSubscript(i, x, byrow = TRUE, exact =
> FALSE)
>>> :
>>>>>   subscript contains NAs
>>>>>
>>>>
>>>>This is not the recommended way to filter on large log fold changes.
>>>>We have implemented a test specifically for this, check the vignette
>>>>section on "Tests of log2 fold change above or below a threshold"
>>>>
>>>>Mike
>>>>
>>>>> How can I resolve this problenms?
>>>>> thanks in advance for the help
>>>>>
>>>>>
>>>>>
>>>>> R version 3.1.0 (2014-04-10)
>>>>> Platform: i686-pc-linux-gnu (32-bit)
>>>>>
>>>>> locale:
>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] splines   parallel  stats     graphics  grDevices utils     datasets
>>>>> [8] methods   base
>>>>>
>>>>> other attached packages:
>>>>>  [1] annotate_1.40.1         RColorBrewer_1.0-5      gplots_2.14.1
>>>>>  [4] org.Hs.eg.db_2.10.1     ReportingTools_2.4.0    AnnotationDbi_1.24.0
>>>>>  [7] RSQLite_0.11.4          DBI_0.2-7               knitr_1.6
>>>>> [10] biomaRt_2.18.0          DESeq2_1.4.5            RcppArmadillo_0.
>>> 4.320.0
>>>>> [13] Rcpp_0.11.2             GenomicRanges_1.14.4    XVector_0.2.0
>>>>> [16] IRanges_1.20.7          affy_1.40.0             NOISeq_2.6.0
>>>>> [19] Biobase_2.22.0          BiocGenerics_0.8.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>  [1] affyio_1.30.0            AnnotationForge_1.4.4    BiocInstaller_1.
>>>>> 12.1
>>>>>  [4] Biostrings_2.30.1        biovizBase_1.10.8        bitops_1.0-
>>>>> 6
>>>>>  [7] BSgenome_1.30.0          Category_2.28.0          caTools_1.
>>>>> 17
>>>>> [10] cluster_1.15.2           colorspace_1.2-4         dichromat_2.0-
>>>>> 0
>>>>> [13] digest_0.6.4             edgeR_3.4.2              evaluate_0.
>>>>> 5.5
>>>>> [16] formatR_0.10             Formula_1.1-1            gdata_2.
>>>>> 13.3
>>>>> [19] genefilter_1.44.0        geneplotter_1.40.0       GenomicFeatures_1.
>>>>> 14.5
>>>>> [22] ggbio_1.10.16            ggplot2_1.0.0            GO.db_2.
>>>>> 10.1
>>>>> [25] GOstats_2.28.0           graph_1.40.1             grid_3.
>>>>> 1.0
>>>>> [28] gridExtra_0.9.1          GSEABase_1.24.0          gtable_0.
>>>>> 1.2
>>>>> [31] gtools_3.4.1             Hmisc_3.14-4             hwriter_1.
>>>>> 3
>>>>> [34] KernSmooth_2.23-12       lattice_0.20-29          latticeExtra_0.6-
>>>>> 26
>>>>> [37] limma_3.18.13            locfit_1.5-9.1           MASS_7.3-
>>>>> 33
>>>>> [40] Matrix_1.1-4             munsell_0.4.2            PFAM.db_2.
>>>>> 10.1
>>>>> [43] plyr_1.8.1               preprocessCore_1.24.0    proto_0.3-
>>>>> 10
>>>>> [46] RBGL_1.38.0              RCurl_1.95-4.1           reshape2_1.
>>>>> 4
>>>>> [49] R.methodsS3_1.6.1        R.oo_1.18.0              Rsamtools_1.
>>>>> 14.3
>>>>> [52] rtracklayer_1.22.7       R.utils_1.32.4           scales_0.
>>>>> 2.4
>>>>> [55] stats4_3.1.0             stringr_0.6.2            survival_2.37-
>>>>> 7
>>>>> [58] tools_3.1.0              VariantAnnotation_1.8.13 XML_3.98-
>>>>> 1.1
>>>>> [61] xtable_1.7-3             zlibbioc_1.8.0
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list