[BioC] edgeR

puvan001 at umn.edu puvan001 at umn.edu
Sun Jun 12 07:35:27 CEST 2011



I am sorry. I’ll describe what I have done.
I am looking for differentially expressed genes in two groups and each 
group has three replicates.
First I used common dispersion (counts.csv is my data file and 
totalreads.csv is about my library size).
counts <-read.csv("counts.csv",row.names=1)
 libsize <- read.csv("totalreads.csv", row.names=1)
 d <-DGEList (counts, group=c(2,2,2,1,1,1), lib.size=libsize$total
d <- DGEList(counts, group=c(2,2,2,1,1,1), lib.size=libsize$Total)
I removed reads < than 5 ( according to the manual )
 d <- d[rowSums(d$counts) >= 5, ]
 d <- estimateCommonDisp(d)
> d$common.dispersion
[1] 0.008390605
 de.com <- exactTest(d)
summary(decideTestsDGE(de.com, p.value=0.01))
-1    92
0  12357
1    190

I decided to use moderated tagwise dispersion and prior. n 10 (according to 
the manual I thought this would be better )
summary(d$tagwise.dispersion)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.006051 0.006051 0.011150 0.008987 0.011150 0.037460 
summary(decideTestsDGE(de.tgw,p.value=0.01))
   [,1] 
-1    96
0  12343
1    200

(What my understanding is 96 genes are down regulated and 200 genes are up 
regulated and I wanted to get the all these differentially expressed genes)

detop.tgw <-topTags (de.tgw, n=296)
topTags(de.tgw)

 my results-only a part

               ID    logConc    logFC      P.Value     	 FDR
>ENSSSCT00000011438 -15.33546 4.312736 1.806225e-97 2.282888e-93
>ENSSSCT00000004165 -15.34398 2.862672 1.236263e-88 7.812563e-85
>ENSSSCT00000013211 -14.81470 3.003443 1.145003e-77 4.823900e-74
>ENSSSCT00000011437 -16.88133 4.625023 4.776290e-73 1.509188e-69
>ENSSSCT00000010657 -18.66400 6.789652 1.482137e-71 3.746546e-68
>ENSSSCT00000013209 -18.85435 7.246780 1.757241e-67 3.701628e-64
>ENSSSCT00000019271 -15.67690 3.069440 7.803066e-66 1.408899e-62
>ENSSSCT00000017966 -15.86827 3.089246 7.932011e-63 1.253159e-59
>ENSSSCT00000013215 -16.27135 3.161484 2.973959e-58 4.176430e-55
>ENSSSCT00000011439 -16.20229 3.461734 2.624576e-55 3.317202e-52
>ENSSSCT00000010656 -34.07557 31.8809589 1.646600e-16 4.729859e-14
>ENSSSCT00000010084 -15.37232  1.3166928 3.997218e-16 1.122685e-13
>ENSSSCT00000003998 -15.32474 -1.3062885 4.289687e-16 1.178638e-13
>ENSSSCT00000018978 -14.38557  0.9368015 1.582580e-15 4.255793e-13
>ENSSSCT00000010832 -21.11786  5.8784200 7.617618e-15 2.005814e-12
>ENSSSCT00000012919 -19.87289  3.7289682 2.159284e-14 5.569629e-12
>ENSSSCT00000002141 -15.34515  1.2304344 2.586146e-14 6.537259e-12
>ENSSSCT00000015009 -16.73944  1.3602630 1.537101e-13 3.809299e-11
>ENSSSCT00000007972 -13.99426 -0.8061797 4.105738e-13 9.934890e-11
>ENSSSCT00000009832 -20.70634  4.7030671 4.166067e-13 9.934890e-11
>ENSSSCT00000017656 -15.16996  1.1164872 8.232265e-13 1.926807e-10
>ENSSSCT00000003967 -12.27699 -0.9214116 1.414101e-12 3.249604e-10
>ENSSSCT00000017482 -13.37026  0.7404062 2.139773e-12 4.829390e-10
>ENSSSCT00000019546 -13.99564  0.9811760 2.192554e-12 4.861701e-10

My confusion starts here. I thought the fold change for down regulated 
genes are lower than 0. But when I counted after changing into csv. format 
I got more than 96 genes (According to the summary, -1 is 96). Am I doing 
something wrong?
Generally, more than 2 fold difference is considered as differential 
expression. I am little bit confused about the value used by edgeR for 
differential expression. These questions may be too simple but my 
background is not statistics and I don't have a very good understanding. 
I'd really appreciate your comments.


Thanks

Sumathy



More information about the Bioconductor mailing list