[BioC] EdgeR: Using estimateCommondisp for housekeeping genes
Tom Keller
kellert at ohsu.edu
Mon Nov 14 23:07:54 CET 2011
This was very helpful to me. But when I compare an R created subset to an expression set I created manually based on the same criteria, they "look" the same, but the R function equals() says FALSE:
> equals(c_eset, co)
[1] FALSE
> equals.Verbose(c_eset, co)
[1] FALSE
attr(,"reason")
[1] "Not same class"
> class(c_eset)
[1] "matrix"
> class(co)
[1] "matrix"
> head(c_eset)
01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
hsa-miR-105_st 6.795107 6.913433 3.580257 2.749580 6.854192 7.477093 3.834115 3.678890
hsa-miR-10b_st 5.949392 6.248459 6.824051 6.804110 6.812655 5.530932 5.739557 6.231381
hsa-miR-1180_st 6.294834 5.884889 7.562048 6.952669 6.779352 5.939806 7.055804 7.188879
hsa-miR-1225-5p_st 4.754944 5.403615 3.803957 3.752518 5.513498 6.534705 5.981105 5.429791
hsa-miR-1260_st 3.995493 3.621449 2.846316 2.297834 3.051231 2.566250 3.323427 3.978176
hsa-miR-127-5p_st 6.252869 6.829449 7.170236 7.086132 5.743895 5.812096 6.813619 6.844393
> head(co)
01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
hsa-miR-105_st 6.795107 6.913433 3.580257 2.749580 6.854192 7.477093 3.834115 3.678890
hsa-miR-10b_st 5.949392 6.248459 6.824051 6.804110 6.812655 5.530932 5.739557 6.231381
hsa-miR-1180_st 6.294834 5.884889 7.562048 6.952669 6.779352 5.939806 7.055804 7.188879
hsa-miR-1225-5p_st 4.754944 5.403615 3.803957 3.752518 5.513498 6.534705 5.981105 5.429791
hsa-miR-1260_st 3.995493 3.621449 2.846316 2.297834 3.051231 2.566250 3.323427 3.978176
hsa-miR-127-5p_st 6.252869 6.829449 7.170236 7.086132 5.743895 5.812096 6.813619 6.844393
> summary(co)
01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
Min. : 1.877 Min. : 2.176 Min. : 1.767 Min. : 1.727 Min. : 1.774 Min. : 2.022 Min. : 1.779 Min. : 1.738
1st Qu.: 3.976 1st Qu.: 4.085 1st Qu.: 2.789 1st Qu.: 2.669 1st Qu.: 3.881 1st Qu.: 4.085 1st Qu.: 2.618 1st Qu.: 2.585
Median : 5.621 Median : 5.331 Median : 3.767 Median : 3.812 Median : 5.557 Median : 5.522 Median : 3.568 Median : 3.581
Mean : 5.567 Mean : 5.616 Mean : 4.247 Mean : 4.176 Mean : 5.643 Mean : 5.708 Mean : 4.087 Mean : 4.180
3rd Qu.: 6.816 3rd Qu.: 6.861 3rd Qu.: 5.316 3rd Qu.: 5.221 3rd Qu.: 7.040 3rd Qu.: 6.828 3rd Qu.: 5.098 3rd Qu.: 5.184
Max. :10.387 Max. :10.703 Max. :11.126 Max. :11.105 Max. :10.939 Max. :10.958 Max. :10.796 Max. :10.930
> summary(c_eset)
01A_H1_08a_446VD.CEL 01A_H1_08b_446VD.CEL 01A_H1_22a_446VD.CEL 01A_H1_22b_446VD.CEL 02A_H1_10a_446VD.CEL 02A_H1_10b_446VD.CEL 02A_H1_24a_446VD.CEL 02A_H1_24b_446VD.CEL
Min. : 1.877 Min. : 2.176 Min. : 1.767 Min. : 1.727 Min. : 1.774 Min. : 2.022 Min. : 1.779 Min. : 1.738
1st Qu.: 3.976 1st Qu.: 4.085 1st Qu.: 2.789 1st Qu.: 2.669 1st Qu.: 3.881 1st Qu.: 4.085 1st Qu.: 2.618 1st Qu.: 2.585
Median : 5.621 Median : 5.331 Median : 3.767 Median : 3.812 Median : 5.557 Median : 5.522 Median : 3.568 Median : 3.581
Mean : 5.567 Mean : 5.616 Mean : 4.247 Mean : 4.176 Mean : 5.643 Mean : 5.708 Mean : 4.087 Mean : 4.180
3rd Qu.: 6.816 3rd Qu.: 6.861 3rd Qu.: 5.316 3rd Qu.: 5.221 3rd Qu.: 7.040 3rd Qu.: 6.828 3rd Qu.: 5.098 3rd Qu.: 5.184
Max. :10.387 Max. :10.703 Max. :11.126 Max. :11.105 Max. :10.939 Max. :10.958 Max. :10.796 Max. :10.930
I don't understand why equals() gives "FALSE". They seem to be identical.
thanks for any suggestions or references.
Tom
MMI DNA Services Core Facility
503-494-2442
kellert at ohsu.edu
Office: 6588 RJH (CROET/BasicScience)
OHSU Shared Resources
On Nov 13, 2011, at 7:35 AM, Mark Robinson wrote:
> Hi Tonya,
>
> I believe your question comes down to how to subset a DGEList object. As the example 'd[housekeeping,]' suggests, it is much like subletting a matrix (if you don't know how to do this, you should consult an Intro to R manual).
>
> Here is an example:
>
> y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4)
> rownames(y) <- paste("Gene",1:nrow(y),sep=".")
> group <- factor(c(1,1,2,2))
> d <- DGEList(counts=y,group=group,lib.size=rep(1000,4))
>
> If you knew your housekeeping genes were in rows {4,6,10,15} of your table, you could simply call:
>
> do <- estimateCommonDisp(d[c(4,6,10,15),])
>
> Of course, there are lots of ways to subset, e.g.:
> http://www.ats.ucla.edu/stat/r/modules/subsetting.htm
>
> Equivalent to above but slight different approach, you could do:
>
> gkeep <- paste("Gene",c(4,6,10,15),sep=".")
> do <- estimateCommonDisp(d[rownames(d) %in% gkeep,])
>
> … and so on.
>
> On the whole enterprise of doing these analyses w/o replicates, there has been a lot of discussion:
>
> http://seqanswers.com/forums/showthread.php?t=4055
> http://seqanswers.com/forums/showthread.php?t=10137
> http://seqanswers.com/forums/showthread.php?t=11081
> https://stat.ethz.ch/pipermail/bioconductor/2011-July/040296.html
> … and so on.
>
> All the best,
> Mark
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
>
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson at imls.uzh.ch
> o: Y32-J-34
> w: http://tiny.cc/mrobin
>
>
>
>
> On 12.11.2011, at 23:12, Tonya Mariko Brunetti wrote:
>
>> Hello,
>>
>> My name is Tonya and I am very new to both R and edgeR so sorry if this seems silly. I have recently gotten back results of two samples from a 454 and do not have replicates of either. I was reading the edgeR manual section about what to do about calculating common dispersion factors if no replicates are available.
>>
>> One of the options was to use genes that are not suppose to be differentially expressed (ie housekeeping genes) to determine the common dispersion. In the manaul they show an example do<-estimateCommonDisp(d[housekeeping,]).
>>
>> Would anyone please explain to me how I can use the housekeeping genes in the data I have collected to estimate this value. I have tried numerous things for input into the function estimateCommonDisp (see below some of what I have tried) but I guess I don't know how to specify just the housekeeping genes?? Or if anyone has a method for common dispersion in edgeR that will work for no replicates that would be appreciated as well!
>>
>> estimateCommonDisp(d['RpS2','RpS28b]) (where the stuff in brackets are my housekeeping genes and d is my normalized DGEList
>> estimateCommonDisp(d[RpS2,RpS28b])
>>
>>
>> Thank you so much!
>>
>> Tonya
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list