[BioC] edgeR - multiple comparisions
Davis McCarthy
dmccarthy at wehi.EDU.AU
Fri May 27 04:52:44 CEST 2011
Hi Sridhara
I do not think it is genes with all zero counts for group A and group C are causing the results you see.
I just tested this on a dataset with 9 groups, and comparing two groups, A and B, with 285 genes with all zero counts in groups A and B yielded "expected" p-values and FDRs. Therefore I do not think that your p-values all being 1 is driven by these all-zero genes.
Is there truly very little difference in expression between groups A and C relative to biological variability in your data? You could have a look at the counts (raw, normalized or counts per million) for the top-ranked (even if not significant) genes for your group A - group C comparison.
If you see little difference in expression between the groups for the top genes then you may have no differential expression between these groups. If, on the other hand, there does look to be large differences in expression between the groups then you may have found a bug in the p-values that are being output and we can go ahead and try to fix the issue.
I notice that you are using R 2.12 and edgeR version 2.0.3. I would recommend updating to R 2.13 and the latest release of edgeR---there have been many improvements made to the package since version 2.0.3 and any bug fixes (if required) will roll out to the current release and devel versions, not legacy versions of the package.
Cheers
Davis
On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote:
> Hello Mark,
> Thank you very much for you email. It greatly helped me to export the FDR,
> p-value, logFC and logConc into csv format.
> I have one real quick question, this is more of statistical question.
> After exporting the FDR, I started analyzing pair by pair. In the below
> example, what I noticed is when comparing the group A - B, I got p-value and
> FDR that make sense. But, when I checked for the group A- group C
> comparision. all the 10,000 genes had FDR and p-value of 1, then I counted
> the number of genes that had "0" in both the groups for both the replicates,
> it turned out to be about 400 genes. So, my question is why the other genes
> (9600) had FDR and p-value of "1". Do you think the 400 genes with "0"
> counts would affect the analysis? Do I need to delete these 400 genes for
> the pair (gp A - gp C) comparison and then run and edgeR analysis
> individually?
>
> groupA Group B Group
> C
> Genes A1 A2 B1 B2 C1 C2
> 1 0 0 11 12 0
> 0
> 2 120 102 45 38 30
> 40
>
>
> Any help or comments will be appreciated.
>
> Many thanks!
> Sridhara
>
>
> On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson at wehi.edu.au>wrote:
>
>> Hi Sridhara,
>>
>> The problem here is that the output of topTags() (your 'fdr06') is not a
>> data.frame or matrix, which is what write.table() works best on. Instead,
>> try:
>>
>> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH",
>> sort.by="p.value")
>> write.table(fdr06$table, file = "FDR06.csv", sep=",")
>>
>> Cheers,
>> Mark
>>
>> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote:
>>
>>> Hello Mark,
>>> Thanks for your email. I have one quick question. Is it possible to
>> export all the 10,427 genes after passing exactTest()? what argument do I
>> need to use to do that? Basically I wanted the complete list of genes with
>> the following info:
>>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value")
>>> Comparison of groups: T6-P18
>>>
>> logConc logFC PValue FDR
>>> PITG_08841 | Pi conserved hypothetical protein (129 nt)
>> -28.79463 42.442850 1.032735e-11 1.076833e-07
>>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt)
>> -12.93992 9.148329 1.288618e-09 6.193586e-06
>>>
>>> If I use the following argument, it is showing an error message.
>>>
>>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by
>> ="p.value")
>>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA,
>> qmethod="double")
>>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, :
>> arguments imply differing number of rows: 10427, 1, 2
>>>
>>> If I do the same with n = 10426, it is executinig without any error.
>> Except that I am missing one row.
>>>
>>> Any suggetions on how to export all the columns for all the rows will be
>> a great help.
>>>
>>> Many thanks!
>>> Sridhara
>>>
>>>
>>>
>>>
>>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <mrobinson at wehi.edu.au>
>> wrote:
>>> Hi Sridhara,
>>>
>>> If you haven't already, you might have a solid read of the edgeR user's
>> guide, it has answers to some of your questions.
>>>
>>>
>>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote:
>>>
>>>> Hello,
>>>> I have used edgeR for DGE analysis and I have few questions regarding
>> the
>>>> model and comparisions.
>>>>
>>>> 1) What kind of statistical model is taken into account to analyze
>> treatment
>>>> structure and conduct analysis of variance?
>>>
>>> For the example you show below (a 2-group comparison), the 'Negative
>> binomial models' Section in the user's guide covers this. Of course, the
>> package has facility for more complicated "treatment structure" through
>> generalized linear models (See the 'Experiment with multiple factors'
>> Section, for example).
>>>
>>>
>>>> 2) How does the edgeR correct the multiple comparisions?
>>>
>>> See ?topTags; its also mentioned in the user's guide.
>>>
>>> ----
>>> topTags(object, n=10, adjust.method="BH", sort.by="p.value")
>>> ...
>>> adjust.method: character string stating the method used to adjust
>>> p-values for multiple testing, passed on to Œp.adjust‚
>>> ...
>>> ----
>>>
>>>
>>>> 3) I am assuming that the calculated p-values in the output after
>>>> performing the tagwiseDispersion are after adjusting for multiple
>> testing.
>>>> Please correct me if I am wrong? If so, what kind of multiple testing
>> is
>>>> taken into account?
>>>
>>> exactTest() doesn't do the multiple testing correction, but topTags()
>> does.
>>>
>>> HTH,
>>> Mark
>>>
>>>
>>>>
>>>> The arguments that I passed are as follows:
>>>>> raw.data <- read.delim("c33_con3.txt")
>>>>> raw.data.2a <- read.delim ("2c33_con3.txt")
>>>>> d2a <- raw.data.2a[, 2:5]
>>>>> rownames(d2a) <- raw.data.2a[,1]
>>>>> group2a <- c(rep("c33", 2), rep("con3", 2))
>>>>> d2a <- DGEList(counts = d2a, group = group2a)
>>>>> d2a <- estimateCommonDisp(d2a)
>>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500)
>>>>> prior.n2a <- estimateSmoothing(d2a)
>>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE)
>>>>> de2a.tgw
>>>> An object of class "DGEExact"
>>>> $table
>>>>
>>>> logConc logFC p.value
>>>> MGG_00005 | Mo hypothetical protein (1014 nt)
>>>> -16.67772 0.05248378 0.9394668
>>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt)
>>>> -14.68066 0.36189877 0.2786389
>>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt)
>>>> -13.50677 0.32379041 0.3759259
>>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt)
>> -14.28686
>>>> -0.35747999 0.3040601
>>>> MGG_00018 | Mo integral membrane protein (2504 nt)
>>>> -14.56791 0.45187243 0.1701996
>>>> 11452 more rows ...
>>>> $comparison
>>>> [1] "c33" "con3"
>>>> $genes
>>>> NULL
>>>>
>>>>
>>>>> sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C LC_TIME=English_United
>>>> States.1252
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>> other attached packages:
>>>> [1] edgeR_2.0.3
>>>> loaded via a namespace (and not attached):
>>>> [1] limma_3.6.9 tools_2.12.1
>>>>
>>>> I would really appreciate your comments or suggestions.
>>>>
>>>> Many thanks!
>>>>
>>>> Sridhara
>>>>
>>>> --
>>>> Sridhara G Kunjeti
>>>> PhD Candidate
>>>> University of Delaware
>>>> Department of Plant and Soil Science
>>>> email- sridhara at udel.edu
>>>> Ph: 832-566-0011
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> ------------------------------
>>> Mark Robinson, PhD (Melb)
>>> Epigenetics Laboratory, Garvan
>>> Bioinformatics Division, WEHI
>>> e: mrobinson at wehi.edu.au
>>> e: m.robinson at garvan.org.au
>>> p: +61 (0)3 9345 2628
>>> f: +61 (0)3 9347 0852
>>> ------------------------------
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>>> ______________________________________________________________________
>>>
>>>
>>>
>>> --
>>> Sridhara G Kunjeti
>>> PhD Candidate
>>> University of Delaware
>>> Department of Plant and Soil Science
>>> email- sridhara at udel.edu
>>> Ph: 832-566-0011
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobinson at wehi.edu.au
>> e: m.robinson at garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:20}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------------------------------------------------
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy at wehi.edu.au
http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list