[BioC] Question about quality of biological replicates when using edgeR
pbczyd .
harryzs1981 at gmail.com
Sat Jul 20 11:11:09 CEST 2013
Hi Mark and others ,
I am using edgeR to analyse one of my RNAseq data.
In this experiment, we have three tumour samples from three different
patients.
Named: Patient_1, Patient_2, Patient_3
We did RNASeq on Tumour-cells under two different conditions: treated and
untreated, and wanted to find differential expressed genes after treatment.
In the end, we got following 6 RNASeq data (100bp, paired-end and
HiSeq2500):
Patient_1_treated, Patient_1_untreated,
Patient_2_treated, Patient_2_untreated,
Patient_3_treated, Patient_3_untreated,
So for each condition (tread vs untreated), we have three biological
replicates.
I followed <4.4 RNA-Seq of oral carcinomas vs matched normal tissue> in
edgeR User’s Guide to analysis these data, for this case study is very
similar to what we did.
However, as you can see (in attachment), from the MDS-plot,
Patient_1_treated is very close to Patient_1_untreated on first two
dimensions, which is same for Patient_2 or Patient_3.
So, without surprise, I end up with finding 0 differential expressed genes
(FDR<0.1)
My questions are:
1.Could I say that, especially basing on the result of MDS-plot, the
biological replicates are not consistent in my case, or the quality
(fitness for purpose) is very low.
2. Should we add more replicates to increased statistic power or trying
other statistic models, or you have another suggestion to deal with this
kind of data ?
Thank you for your suggesting in advances.
Regards,
Sheng
=======================
R code:
library( "edgeR" )
files <- dir( pattern=".read_cnt", full.names = FALSE)
RG <- readDGE( files )
colnames( RG$samples )
d <- DGEList( counts = RG )
y <- d
colnames(y)<-c( "Patient_1_treated", "Patient_1_untreated",
"Patient_2_treated", "Patient_2_untreated","Patient_3_treated",
"Patient_3_untreated")
##==Filtering
keep <- rowSums(cpm(y) > 10 ) >= 3
y <- y[keep,]
dim(y)
##Re-compute the library sizes:
y$samples$lib.size <- colSums(y$counts)
y$samples
##Normalizing
y <- calcNormFactors(y)
y$samples
plotMDS(y, top=500, main ="Multi-Dimensional Scaling Plot for Count Data")
###The design matrix
Patient <- factor( c( "Patient_1", "Patient_1","Patient_2",
"Patient_2","Patient_3", "Patient_3" ) )
Treat <- factor( c( "Treated", "Untread", "Treated", "Untread","Treated",
"Untread") )
data.frame( Sample = colnames(y), Patient,Treat)
design <- model.matrix( ~Patient + Treat )
rownames( design ) <- colnames( y )
y <- estimateGLMCommonDisp( y, design, verbose = TRUE )
y <- estimateGLMTrendedDisp( y, design )
y <- estimateGLMTagwiseDisp( y, design )
fit <- glmFit(y, design)
lrt <- glmLRT( fit )
top <- topTags( lrt, n = 50 )
q_value = 0.05
summary( de <- decideTestsDGE( lrt, p.value = q_value ) )
q_value = 0.1
summary( de <- decideTestsDGE( lrt, p.value = q_value ) )
plotBCV( y, cex = 0.8)
#sessionInfo( )
#R version 3.0.1 (2013-05-16)
#Platform: x86_64-apple-darwin10.8.0 (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] splines stats graphics grDevices utils datasets methods
base
#
#other attached packages:
#[1] ggplot2_0.9.3.1 biomaRt_2.16.0 edgeR_3.2.3 limma_3.16.5
#
#loaded via a namespace (and not attached):
# [1] MASS_7.3-27 RColorBrewer_1.0-5 RCurl_1.95-4.1 XML_3.95-0.2
colorspace_1.2-2
# [6] dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2
labeling_0.2
#[11] munsell_0.4 plyr_1.8 proto_0.3-10
reshape2_1.2.2 scales_0.2.3
#[16] stringr_0.6.2 tools_3.0.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotBCV.pdf
Type: application/pdf
Size: 115643 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130720/e06a5c5e/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotMDS.pdf
Type: application/pdf
Size: 47829 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130720/e06a5c5e/attachment-0001.pdf>
More information about the Bioconductor
mailing list