[BioC] edgeR on differentially expressed genes with low read counts
Naomi Altman
naomi at stat.psu.edu
Tue Nov 29 18:33:34 CET 2011
You should filter out the genes with extremely
low total counts as you will not have enough power to achieve significance.
Naomi
At 08:54 AM 11/29/2011, LucÃa [guest] wrote:
>Hi,
>I am using the package edgeR to determine
>differentially expressed genes in RNA-seq data.
>I have 8 samples, corresponding to 3 patients (2
>conditions per patient and 2 patients
>duplicated). I performed the contrast of these
>two conditions, following the user guide
>(section 11 Case study: Oral carcinomas vs
>matched normal tissue). When analyzing some
>border line differentially expressed genes (FDR
>~ 0.02) I found that in some cases, the read
>counts was really low, e.g. only one sample with
>2 reads, and the others (7) with 0 counts.
> >
> countsTable[rownames(countsTable)=="ENSG00000207696",grep("CT",colnames(countsTable))]
> 61_CT.poli 61_CT.tot 67_CT.poli
> 67_CT.tot 70_CT.poli 70_CT.tot 61m2_CT.tot 67m2_CT.tot
>ENSG00000207696 0 0 0
> 0 2 0 0 0
>
>The design matrix used was:
> > design
> patientsCT61 patientsCT67 patientsCT70
> as.factor(1 - (as.numeric(groupsCT) - 1))1
>1 1 0 0
> 0
>2 0 1 0
> 0
>3 0 0 1
> 0
>4 1 0 0
> 0
>5 0 1 0
> 0
>6 1 0 0
> 1
>7 0 1 0
> 1
>8 0 0 1
> 1
>
>The result for the gene in question was:
> > tab.DGEensCT[rownames(tab.DGEensCT)=="ENSG00000207696",]
> table.logConc table.logFC
> table.PValue table.FDR adjust.method comparison
>ENSG00000207696 -18.34 7.726
>0.005222 0.03941 BH as.factor(1 - (as.numeric(groupsCT) - 1))1
>
>When investigating the result of the gene, I
>found that the glm-fitted values looked like this:
>
> >
> format((glmfit.DGEensCT$fitted.values)[rownames(glmfit.DGEensCT$fitted.values)=="ENSG00000207696",],scientific=FALSE,digits=4)
> 61_CT.tot 67_CT.tot 70_CT.tot
> 61m2_CT.tot 67m2_CT.tot 61_CT.poli 67_CT.poli 70_CT.poli
>"0.00008988" "0.00004048" "0.06229493"
>"0.00026943" "0.00014959" "0.03479503" "0.03522840" "1.84247667"
>
>Since I was not able to find the details used in
>edgeR to calculate the model easily, I was
>wondering if from this glm-fitted values is it
>reasonable to obtain such a low FDR?
>I used the common dispersion for the variance estimate:
>DGEensCT.CR<-estimateCRDisp(DGEensCT,design)
>glmfit.DGEensCT <- glmFit(DGEensCT,
>design,dispersion = DGEensCT.CR$CR.common.dispersion)
>lrt.DGEensCT <- glmLRT(DGEensCT, glmfit.DGEensCT)
>
>Could someone help me with this?
>Thank you very much in advanced!
>
>Cheers,
>LucÃa
>
> -- output of sessionInfo():
>
>R version 2.12.0 (2010-10-15)
>Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>locale:
>[1] C
>
>attached base packages:
>[1] stats graphics grDevices utils datasets methods base
>
>other attached packages:
>[1] biomaRt_2.6.0 edgeR_2.0.5
>
>loaded via a namespace (and not attached):
>[1] RCurl_1.4-3 XML_3.2-0 limma_3.6.9
>
>--
>Sent via the guest posting facility at bioconductor.org.
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list