[BioC] Too many DEG with edgeR output?

Gordon K Smyth smyth at wehi.EDU.AU
Thu Oct 24 01:02:29 CEST 2013


Dear Bernardo,

> Date: Mon, 21 Oct 2013 11:23:57 +0200
> From: Bernardo Bello <popnard at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] Too many DEG with edgeR output?
>
> Hi there,
>
> I'm dealing with bacterial RNA-seq analysis. My experiment is very simple.
> Two samples to compare and no replicates.

Simplest to do but hardest to analyse.

> Reads were generated with Ion Torrent PGM using 316 chip. One for each 
> sample and performed in different days.
>
> Since I had too many differentially expressed genes, Should I be more
> conservative assigning edgeR dispersion value?

Well, in the absence of replicates you are simply picking a dispersion 
value out of the air as it were.  So, if you get more DE gene than you 
want, why not use a larger dispersion value?

Otherwise you might try a new feature of edgeR to try to estimate the 
dispersions.  First, make a DGEList object with both libraries as one 
group:

   d2 <- d
   d2$samples$group <- c(1,1)

Then estimate the dispersion robustly:

   d2 <- estimateDisp(d2, robust=TRUE, winsor.tail.p=c(0.05,0.4))
   plotBCV(d2)

Now copy the trended dispersion to your original object:

   d$trended.dispersion <- d2$trended.dispersion

Now do the tests:

   et <- exactTest(d)

Best wishes
Gordon


> Also, there are considerable more up-regulated genes in *exvivo* than in 
> plate sample.

I can't help you with that, if it is indeed a problem.

Best wishes
Gordon

> See logFC_vs_logCPM figure:
>
> https://docs.google.com/file/d/0B8-ZAuZe8jldY0Y5SWJSdXh1WGc/edit?usp=sharing
>
> Thanks for you help, Bernardo
>
>
>
> - tmap code:
>
> tmap map2 -f HPNK_clean.fsa -r exvivo.fastq -i fastq -s exvivo.sam --verbose
>
> tmap map2 -f HPNK_clean.fsa -r plate.fastq -i fastq -s plate.sam --verbose
>
> - Flasgstat:
>
> exvivo:
>
>> 3240242 + 0 in total (QC-passed reads + QC-failed reads)
>
>
>> 2132481 + 0 mapped (65.81%:nan%)
>
> plate:
>
>> 3774075 + 0 in total (QC-passed reads + QC-failed reads)
>
>
>> 3510438 + 0 mapped (93.01%:nan%)
>
>
> - count:
>
>    python -m HTSeq.scripts.count -m intersection-nonempty -t CDS -i ID
> exvivo.sam HPNK.gff > exvivo.counts
>
>    python -m HTSeq.scripts.count -m intersection-nonempty -t CDS -i ID
> plate.sam HPNK.gff > plate.counts
>
>
> - count stats:
>
> ex-vivo stats
>
>> no_feature 777946
>
>> ambiguous 1
>
>> too_low_aQual 0
>
>> not_aligned 1107761
>
>> alignment_not_unique 0
>
> plate stats
>
>> no_feature 776707
>
>> ambiguous 47
>
>> too_low_aQual 0
>
>> not_aligned 263637
>
>> alignment_not_unique 0
>
> - edgeR code:
>
> library(edgeR)
>
> files <- dir(pattern="*\\.counts$")
>
> RG <- readDGE(files, header=FALSE)
>
> RG
>
> keep <- rowSums(cpm(RG)>1) >= 2 #we keep genes that achieve at least one
> count per million (cpm) in at least three samples
>
> RG <- RG[keep,]
>
> dim(RG)
>
>
> RG <- calcNormFactors(RG)
>
> RG$samples
>
>
> plotMDS(RG)
>
>
> bcv <- 0.2 #Assigned dispersion value of 0.2
>
> m <- as.matrix(RG)
>
> d <- DGEList(counts=m, group=(1:2)) #modify 'group' depending on sample
> number. Also can be adapted to replicated samples, see'?DGEList'.
>
> d
>
> et <- exactTest(d, pair=(1:2),dispersion=bcv^2) #exactTest(RG,
> pair=(1:2),dispersion=bcv^2)
>
> et
>
> top <- topTags(et)
>
> top
>
> cpm(RG)[rownames(top), ] #Check the individual cpm values for the top genes:
>
> summary(de <- decideTestsDGE(et)) #The total number of DE genes at 5% FDR
> is given by'decideTestsDGE'.
>
>
>   [,1]
>
> -1  200
>
> 0  1176
>
> 1   769
>
> Of the 'number' tags identified as DE, 769 are up-regulated ex-vivo and 200
> are down-regulated.
>
> detags <- rownames(RG)[as.logical(de)] #detags contains the DE genes at 5%
> FDR
>
> plotSmear(et, de.tags=detags) #plot all genes and highlight DE genes at 5%
> FDR
>
> abline(h=c(-1, 1), col="blue") #The blue lines indicate 2-fold changes.
>
> title("plate vs ex-vivo")
>
> dev.copy2pdf(file = "Figure_1.pdf") #Save as .pdf##
>
> -- 
>
> *Bernardo Bello Ortí*
>
> PhD student
>
> CReSA-IRTA
>
> Campus de Bellaterra-Universitat Autònoma de Barcelona
>
> Edifici CReSA
>
> 08193  Bellaterra (Barcelona, Spain)
>
> Tel.: 647 42 52 63 *www.cresa.es  *
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list