[BioC] EdgeR: Using estimateCommondisp for housekeeping genes

Sun Nov 13 16:35:39 CET 2011

Hi Tonya,

I believe your question comes down to how to subset a DGEList object.  As the example 'd[housekeeping,]' suggests, it is much like subletting a matrix (if you don't know how to do this, you should consult an Intro to R manual).  

Here is an example:

y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4)
rownames(y) <- paste("Gene",1:nrow(y),sep=".")
group <- factor(c(1,1,2,2))
d <- DGEList(counts=y,group=group,lib.size=rep(1000,4))

If you knew your housekeeping genes were in rows {4,6,10,15} of your table, you could simply call:

do <- estimateCommonDisp(d[c(4,6,10,15),])

Of course, there are lots of ways to subset, e.g.:
http://www.ats.ucla.edu/stat/r/modules/subsetting.htm

Equivalent to above but slight different approach, you could do:

gkeep <- paste("Gene",c(4,6,10,15),sep=".")
do <- estimateCommonDisp(d[rownames(d) %in% gkeep,])

… and so on.

On the whole enterprise of doing these analyses w/o replicates, there has been a lot of discussion:

http://seqanswers.com/forums/showthread.php?t=4055
http://seqanswers.com/forums/showthread.php?t=10137
http://seqanswers.com/forums/showthread.php?t=11081
https://stat.ethz.ch/pipermail/bioconductor/2011-July/040296.html
… and so on.

All the best,
Mark

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y32-J-34
w: http://tiny.cc/mrobin

On 12.11.2011, at 23:12, Tonya Mariko Brunetti wrote:

> Hello,
> 
> My name is Tonya and I am very new to both R and edgeR so sorry if this seems silly.  I have recently gotten back results of two samples from a 454 and do not have replicates of either.  I was reading the edgeR manual section about what to do about calculating common dispersion factors if no replicates are available.
> 
> One of the options was to use genes that are not suppose to be differentially expressed (ie housekeeping genes) to determine the common dispersion.  In the manaul they show an example do<-estimateCommonDisp(d[housekeeping,]).
> 
> Would anyone please explain to me how I can use the housekeeping genes in the data I have collected to estimate this value.  I have tried numerous things for input into the function estimateCommonDisp (see below some of what I have tried) but I guess I don't know how to specify just the housekeeping genes??  Or if anyone has a method for common dispersion in edgeR that will work for no replicates that would be appreciated as well!
> 
> estimateCommonDisp(d['RpS2','RpS28b]) (where the stuff in brackets are my housekeeping genes and d is my normalized DGEList
> estimateCommonDisp(d[RpS2,RpS28b])
> 
> 
> Thank you so much!
> 
> Tonya
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor