[BioC] edgeR on differentially expressed genes with low read counts
LucÃa [guest]
guest at bioconductor.org
Tue Nov 29 14:54:51 CET 2011
Hi,
I am using the package edgeR to determine differentially expressed genes in RNA-seq data. I have 8 samples, corresponding to 3 patients (2 conditions per patient and 2 patients duplicated). I performed the contrast of these two conditions, following the user guide (section 11 Case study: Oral carcinomas vs matched normal tissue). When analyzing some border line differentially expressed genes (FDR ~ 0.02) I found that in some cases, the read counts was really low, e.g. only one sample with 2 reads, and the others (7) with 0 counts.
> countsTable[rownames(countsTable)=="ENSG00000207696",grep("CT",colnames(countsTable))]
61_CT.poli 61_CT.tot 67_CT.poli 67_CT.tot 70_CT.poli 70_CT.tot 61m2_CT.tot 67m2_CT.tot
ENSG00000207696 0 0 0 0 2 0 0 0
The design matrix used was:
> design
patientsCT61 patientsCT67 patientsCT70 as.factor(1 - (as.numeric(groupsCT) - 1))1
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 1 0 0 0
5 0 1 0 0
6 1 0 0 1
7 0 1 0 1
8 0 0 1 1
The result for the gene in question was:
> tab.DGEensCT[rownames(tab.DGEensCT)=="ENSG00000207696",]
table.logConc table.logFC table.PValue table.FDR adjust.method comparison
ENSG00000207696 -18.34 7.726 0.005222 0.03941 BH as.factor(1 - (as.numeric(groupsCT) - 1))1
When investigating the result of the gene, I found that the glm-fitted values looked like this:
> format((glmfit.DGEensCT$fitted.values)[rownames(glmfit.DGEensCT$fitted.values)=="ENSG00000207696",],scientific=FALSE,digits=4)
61_CT.tot 67_CT.tot 70_CT.tot 61m2_CT.tot 67m2_CT.tot 61_CT.poli 67_CT.poli 70_CT.poli
"0.00008988" "0.00004048" "0.06229493" "0.00026943" "0.00014959" "0.03479503" "0.03522840" "1.84247667"
Since I was not able to find the details used in edgeR to calculate the model easily, I was wondering if from this glm-fitted values is it reasonable to obtain such a low FDR?
I used the common dispersion for the variance estimate:
DGEensCT.CR<-estimateCRDisp(DGEensCT,design)
glmfit.DGEensCT <- glmFit(DGEensCT, design,dispersion = DGEensCT.CR$CR.common.dispersion)
lrt.DGEensCT <- glmLRT(DGEensCT, glmfit.DGEensCT)
Could someone help me with this?
Thank you very much in advanced!
Cheers,
LucÃa
-- output of sessionInfo():
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.6.0 edgeR_2.0.5
loaded via a namespace (and not attached):
[1] RCurl_1.4-3 XML_3.2-0 limma_3.6.9
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list