[Bioc-devel] Deseq2 and differentia expression

Thu Jul 10 14:46:27 CEST 2014

hi Jarod,

On Thu, Jul 10, 2014 at 7:59 AM, jarod_v6 at libero.it <jarod_v6 at libero.it> wrote:
> Hi there!!!
>
> I have did this code:
> SampleTable <-data.frame(SampleName=metadata$ID_CLINICO,fileName=metadata$NOME,
> condition=metadata$CONDITION,prim=metadata$CDT)
> ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="
> Count/", design= ~condition) # effetto dello mutazione
> ddHTSeq$condition <- relevel(ddHTSeq$condition, "NVI")# quindi verso non
> viscerali
> dds <- DESeq(ddHTSeq)
> res <-results(dds)
>
> resOrdered <- res[order(res$padj),]
> head(resOrdered)
> ResSig <- res[ which(res$padj < 0.1 ), ]
>
>
> I want to select some data. How can I do? which is the good cut-off on FDR
> values?

The code above does the selection on adjusted p-value. The right FDR
cutoff is up to you, what percent of false discoveries is tolerable in
the final list of genes? The considerations are: the cost of
validation or following up on a false discovery, versus the cost of a
missed discovery. These are hard to quantify even if you know all the
details of an experiment.

> All the data have a FDR less thank 0.1 . :
> Is it right this comand?
> res[ which(res$padj < 0.1 ), ]
>

yes. The which() is necessary because some of the res$padj have NA. If
you have a logical vector with NA, you cannot directly index a
DataFrame, but you can index after calling which(), which will return
the numeric index of the TRUE's. You could also subset with:
subset(res, padj < 0.1).

The reason for the NAs is explained in the vignette: "Note that some
values in the results table can be set to NA, for either one of the
following reasons:..."

> How many significant genes are with FDR less than 0.1 and  have an absolute
> value of  foldchange  more of 1 ? I  have and error on this. I have many NA
> values.
>
> If I try this code I have the follow errors
>> significant.genes = res[(res$padj < .05 & abs(res$log2FoldChange) >= 1 ),] #
> Set thethreshold for the log2 fold change.
> Error in normalizeSingleBracketSubscript(i, x, byrow = TRUE, exact = FALSE) :
>   subscript contains NAs
>

This is not the recommended way to filter on large log fold changes.
We have implemented a test specifically for this, check the vignette
section on "Tests of log2 fold change above or below a threshold"

Mike

> How can I resolve this problenms?
> thanks in advance for the help
>
>
>
> R version 3.1.0 (2014-04-10)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
>  [1] annotate_1.40.1         RColorBrewer_1.0-5      gplots_2.14.1
>  [4] org.Hs.eg.db_2.10.1     ReportingTools_2.4.0    AnnotationDbi_1.24.0
>  [7] RSQLite_0.11.4          DBI_0.2-7               knitr_1.6
> [10] biomaRt_2.18.0          DESeq2_1.4.5            RcppArmadillo_0.4.320.0
> [13] Rcpp_0.11.2             GenomicRanges_1.14.4    XVector_0.2.0
> [16] IRanges_1.20.7          affy_1.40.0             NOISeq_2.6.0
> [19] Biobase_2.22.0          BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>  [1] affyio_1.30.0            AnnotationForge_1.4.4    BiocInstaller_1.
> 12.1
>  [4] Biostrings_2.30.1        biovizBase_1.10.8        bitops_1.0-
> 6
>  [7] BSgenome_1.30.0          Category_2.28.0          caTools_1.
> 17
> [10] cluster_1.15.2           colorspace_1.2-4         dichromat_2.0-
> 0
> [13] digest_0.6.4             edgeR_3.4.2              evaluate_0.
> 5.5
> [16] formatR_0.10             Formula_1.1-1            gdata_2.
> 13.3
> [19] genefilter_1.44.0        geneplotter_1.40.0       GenomicFeatures_1.
> 14.5
> [22] ggbio_1.10.16            ggplot2_1.0.0            GO.db_2.
> 10.1
> [25] GOstats_2.28.0           graph_1.40.1             grid_3.
> 1.0
> [28] gridExtra_0.9.1          GSEABase_1.24.0          gtable_0.
> 1.2
> [31] gtools_3.4.1             Hmisc_3.14-4             hwriter_1.
> 3
> [34] KernSmooth_2.23-12       lattice_0.20-29          latticeExtra_0.6-
> 26
> [37] limma_3.18.13            locfit_1.5-9.1           MASS_7.3-
> 33
> [40] Matrix_1.1-4             munsell_0.4.2            PFAM.db_2.
> 10.1
> [43] plyr_1.8.1               preprocessCore_1.24.0    proto_0.3-
> 10
> [46] RBGL_1.38.0              RCurl_1.95-4.1           reshape2_1.
> 4
> [49] R.methodsS3_1.6.1        R.oo_1.18.0              Rsamtools_1.
> 14.3
> [52] rtracklayer_1.22.7       R.utils_1.32.4           scales_0.
> 2.4
> [55] stats4_3.1.0             stringr_0.6.2            survival_2.37-
> 7
> [58] tools_3.1.0              VariantAnnotation_1.8.13 XML_3.98-
> 1.1
> [61] xtable_1.7-3             zlibbioc_1.8.0
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel