Hi Sridhara
The function exactTest() has an argument 'pair'. This argument can be used to define the two groups you wish to compare. The order in which you put in the groups in the pair will determine the direction of the log-fold change.
For instance if you called
> exactTest(d1, common.disp=FALSE, pair=c("con3","dca3"))
then you will get the comparison dca3 - con3. If you put in pair=c("dca3","con3") then you will get the comparison con3-dca3.
This is a much better approach to making different comparisons between groups than renaming the columns of your count matrix.
In principle, ?exactTest should have been sufficient to answer this question for you. It's considered good form to read the documentation thoroughly before posting to the list.
plotSmear() operates on DGEExact objects (output from exactTest), so this avoids the final issue you're worried about entirely --- direction of the log-fold change on the plot will match that in your exact test. Again, ?plotSmear has details.
Cheers
Davis
On Jun 1, 2011, at 4:35 AM, Sridhara Gupta Kunjeti wrote:
> Hello Davis,
> Yes, this helped me to solve the problem. On the other hand, I have a different kind of question, which is related to the exactTest. First two columns in my inputs files are the counts for the control group.
> =============================================
> example 1: Textfile1
> Gene con3-1 con3-2 dca-1 dca-2.
> when I run the exactTest
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups: dca3 - con3
>
> So, if the logFC is positive, it means it is up-regulated in dca3, and these dots are plotted above '0' in the plotsmear.
>
> ================================================
> example 2:
> When I swap the columns
> Gene dca-1 dca-2 con3-1 con3-2
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups: con3 - dca3
>
> Here if the logFC is negative, it means it is up-regulated in dca3, and these are plotted below '0' in the plotSmear.
>
> Here the bottom line is if I swap the columns, when I run the exactTest, it changes the sequence in pairing. In other words pairs change from dca3 - con3 to con3 - dca3.
>
> This worked absolutely fine with 6 pairs. But for four pairs, even when I swap the columns in the input data, in the exactTest the sequence is not changing. i.e., con3 - c33 does not change to c33-con3
>
> My worry is if I look at logFC values, for some of the pair if the values is "+", then it is up-regulated in the treatment and for some it is "-". I am assuming this is going to be a problem when I generate plotSmear. I mean inconsistent.
>
> Any help in generating same logFC values (positive for upregualtion in treatment) will be appreciated.
>
> Thanks,
> Sridhara
>
>
>
> On Mon, May 30, 2011 at 2:33 AM, Davis McCarthy wrote:
> Hi Sridhara
>
> I'm not sure I completely follow what you're saying about the FDRs being 0.5, 0.6 etc. Can you show us a top table? Output of topTags(). Actually it would be good to see all of your edgeR function calls to get a better idea of how you're carrying out your analysis. In principle I don't think that "0" in the data will have any adverse effects on your analysis, so I'm not really sure what the results are that you're trying to describe.
>
> If you are in an R 2.13 session and enter the commands:
> source("http://www.bioconductor.org/biocLite.R")
> biocLite("edgeR")
> then edgeR version 2.2.5 will be installed on your system. I would recommend following the latest version of the edgeR User's Guide, which was released with edgeR 2.2.x. You can get it from edgeR's Bioconductor page:
> http://www.bioconductor.org/packages/2.8/bioc/html/edgeR.html
>
> Hope that helps.
>
> Cheers
> Davis
>
>
> On May 27, 2011, at 9:36 PM, Sridhara Gupta Kunjeti wrote:
>
>> Hello Davis,
>> Thank you very much for your email. After looking at one of my comparisons, it makes total sense about the p-value. But, I did notice that out of 10827 genes, most of them (10820) had an FDR of 1 and rest others had an FDR of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering if "0" in the data will cause this FDR?
>>
>> I will also install latest version of R 2.13 and also the edgeR. Could you please let me know the latest version of edgeR that is available for me to download? I am assuming I can still follow the same manual (from version 2.0.3) for the new version of edgeR.
>>
>> Many thanks!
>> Sridhara
>>
>>
>> On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy wrote:
>> Hi Sridhara
>>
>> I do not think it is genes with all zero counts for group A and group C are causing the results you see.
>>
>> I just tested this on a dataset with 9 groups, and comparing two groups, A and B, with 285 genes with all zero counts in groups A and B yielded "expected" p-values and FDRs. Therefore I do not think that your p-values all being 1 is driven by these all-zero genes.
>>
>> Is there truly very little difference in expression between groups A and C relative to biological variability in your data? You could have a look at the counts (raw, normalized or counts per million) for the top-ranked (even if not significant) genes for your group A - group C comparison.
>>
>> If you see little difference in expression between the groups for the top genes then you may have no differential expression between these groups. If, on the other hand, there does look to be large differences in expression between the groups then you may have found a bug in the p-values that are being output and we can go ahead and try to fix the issue.
>>
>> I notice that you are using R 2.12 and edgeR version 2.0.3. I would recommend updating to R 2.13 and the latest release of edgeR---there have been many improvements made to the package since version 2.0.3 and any bug fixes (if required) will roll out to the current release and devel versions, not legacy versions of the package.
>>
>> Cheers
>> Davis
>>
>>
>>
>> On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote:
>>
>> > Hello Mark,
>> > Thank you very much for you email. It greatly helped me to export the FDR,
>> > p-value, logFC and logConc into csv format.
>> > I have one real quick question, this is more of statistical question.
>> > After exporting the FDR, I started analyzing pair by pair. In the below
>> > example, what I noticed is when comparing the group A - B, I got p-value and
>> > FDR that make sense. But, when I checked for the group A- group C
>> > comparision. all the 10,000 genes had FDR and p-value of 1, then I counted
>> > the number of genes that had "0" in both the groups for both the replicates,
>> > it turned out to be about 400 genes. So, my question is why the other genes
>> > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0"
>> > counts would affect the analysis? Do I need to delete these 400 genes for
>> > the pair (gp A - gp C) comparison and then run and edgeR analysis
>> > individually?
>> >
>> > groupA Group B Group
>> > C
>> > Genes A1 A2 B1 B2 C1 C2
>> > 1 0 0 11 12 0
>> > 0
>> > 2 120 102 45 38 30
>> > 40
>> >
>> >
>> > Any help or comments will be appreciated.
>> >
>> > Many thanks!
>> > Sridhara
>> >
>> >
>> > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson wrote:
>> >
>> >> Hi Sridhara,
>> >>
>> >> The problem here is that the output of topTags() (your 'fdr06') is not a
>> >> data.frame or matrix, which is what write.table() works best on. Instead,
>> >> try:
>> >>
>> >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH",
>> >> sort.by="p.value")
>> >> write.table(fdr06$table, file = "FDR06.csv", sep=",")
>> >>
>> >> Cheers,
>> >> Mark
>> >>
>> >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote:
>> >>
>> >>> Hello Mark,
>> >>> Thanks for your email. I have one quick question. Is it possible to
>> >> export all the 10,427 genes after passing exactTest()? what argument do I
>> >> need to use to do that? Basically I wanted the complete list of genes with
>> >> the following info:
>> >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value")
>> >>> Comparison of groups: T6-P18
>> >>>
>> >> logConc logFC PValue FDR
>> >>> PITG_08841 | Pi conserved hypothetical protein (129 nt)
>> >> -28.79463 42.442850 1.032735e-11 1.076833e-07
>> >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt)
>> >> -12.93992 9.148329 1.288618e-09 6.193586e-06
>> >>>
>> >>> If I use the following argument, it is showing an error message.
>> >>>
>> >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by
>> >> ="p.value")
>> >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA,
>> >> qmethod="double")
>> >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, :
>> >> arguments imply differing number of rows: 10427, 1, 2
>> >>>
>> >>> If I do the same with n = 10426, it is executinig without any error.
>> >> Except that I am missing one row.
>> >>>
>> >>> Any suggetions on how to export all the columns for all the rows will be
>> >> a great help.
>> >>>
>> >>> Many thanks!
>> >>> Sridhara
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson
>> >> wrote:
>> >>> Hi Sridhara,
>> >>>
>> >>> If you haven't already, you might have a solid read of the edgeR user's
>> >> guide, it has answers to some of your questions.
>> >>>
>> >>>
>> >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote:
>> >>>
>> >>>> Hello,
>> >>>> I have used edgeR for DGE analysis and I have few questions regarding
>> >> the
>> >>>> model and comparisions.
>> >>>>
>> >>>> 1) What kind of statistical model is taken into account to analyze
>> >> treatment
>> >>>> structure and conduct analysis of variance?
>> >>>
>> >>> For the example you show below (a 2-group comparison), the 'Negative
>> >> binomial models' Section in the user's guide covers this. Of course, the
>> >> package has facility for more complicated "treatment structure" through
>> >> generalized linear models (See the 'Experiment with multiple factors'
>> >> Section, for example).
>> >>>
>> >>>
>> >>>> 2) How does the edgeR correct the multiple comparisions?
>> >>>
>> >>> See ?topTags; its also mentioned in the user's guide.
>> >>>
>> >>> ----
>> >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value")
>> >>> ...
>> >>> adjust.method: character string stating the method used to adjust
>> >>> p-values for multiple testing, passed on to Œp.adjust‚
>> >>> ...
>> >>> ----
>> >>>
>> >>>
>> >>>> 3) I am assuming that the calculated p-values in the output after
>> >>>> performing the tagwiseDispersion are after adjusting for multiple
>> >> testing.
>> >>>> Please correct me if I am wrong? If so, what kind of multiple testing
>> >> is
>> >>>> taken into account?
>> >>>
>> >>> exactTest() doesn't do the multiple testing correction, but topTags()
>> >> does.
>> >>>
>> >>> HTH,
>> >>> Mark
>> >>>
>> >>>
>> >>>>
>> >>>> The arguments that I passed are as follows:
>> >>>>> raw.data <- read.delim("c33_con3.txt")
>> >>>>> raw.data.2a <- read.delim ("2c33_con3.txt")
>> >>>>> d2a <- raw.data.2a[, 2:5]
>> >>>>> rownames(d2a) <- raw.data.2a[,1]
>> >>>>> group2a <- c(rep("c33", 2), rep("con3", 2))
>> >>>>> d2a <- DGEList(counts = d2a, group = group2a)
>> >>>>> d2a <- estimateCommonDisp(d2a)
>> >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500)
>> >>>>> prior.n2a <- estimateSmoothing(d2a)
>> >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE)
>> >>>>> de2a.tgw
>> >>>> An object of class "DGEExact"
>> >>>> $table
>> >>>>
>> >>>> logConc logFC p.value
>> >>>> MGG_00005 | Mo hypothetical protein (1014 nt)
>> >>>> -16.67772 0.05248378 0.9394668
>> >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt)
>> >>>> -14.68066 0.36189877 0.2786389
>> >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt)
>> >>>> -13.50677 0.32379041 0.3759259
>> >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt)
>> >> -14.28686
>> >>>> -0.35747999 0.3040601
>> >>>> MGG_00018 | Mo integral membrane protein (2504 nt)
>> >>>> -14.56791 0.45187243 0.1701996
>> >>>> 11452 more rows ...
>> >>>> $comparison
>> >>>> [1] "c33" "con3"
>> >>>> $genes
>> >>>> NULL
>> >>>>
>> >>>>
>> >>>>> sessionInfo()
>> >>>> R version 2.12.1 (2010-12-16)
>> >>>> Platform: i386-pc-mingw32/i386 (32-bit)
>> >>>> locale:
>> >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>> >>>> States.1252 LC_MONETARY=English_United States.1252
>> >>>> [4] LC_NUMERIC=C LC_TIME=English_United
>> >>>> States.1252
>> >>>> attached base packages:
>> >>>> [1] stats graphics grDevices utils datasets methods base
>> >>>> other attached packages:
>> >>>> [1] edgeR_2.0.3
>> >>>> loaded via a namespace (and not attached):
>> >>>> [1] limma_3.6.9 tools_2.12.1
>> >>>>
>> >>>> I would really appreciate your comments or suggestions.
>> >>>>
>> >>>> Many thanks!
>> >>>>
>> >>>> Sridhara
>> >>>>
>> >>>> --
>> >>>> Sridhara G Kunjeti
>> >>>> PhD Candidate
>> >>>> University of Delaware
>> >>>> Department of Plant and Soil Science
>> >>>> email- sridhara@udel.edu
>> >>>> Ph: 832-566-0011
>> >>>>
>> >>>> [[alternative HTML version deleted]]
>> >>>>
>> >>>> _______________________________________________
>> >>>> Bioconductor mailing list
>> >>>> Bioconductor@r-project.org
>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>
>> >>> ------------------------------
>> >>> Mark Robinson, PhD (Melb)
>> >>> Epigenetics Laboratory, Garvan
>> >>> Bioinformatics Division, WEHI
>> >>> e: mrobinson@wehi.edu.au
>> >>> e: m.robinson@garvan.org.au
>> >>> p: +61 (0)3 9345 2628
>> >>> f: +61 (0)3 9347 0852
>> >>> ------------------------------
>> >>>
>> >>>
>> >>> ______________________________________________________________________
>> >>> The information in this email is confidential and intended solely for the
>> >> addressee.
>> >>> You must not disclose, forward, print or use it without the permission of
>> >> the sender.
>> >>> ______________________________________________________________________
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Sridhara G Kunjeti
>> >>> PhD Candidate
>> >>> University of Delaware
>> >>> Department of Plant and Soil Science
>> >>> email- sridhara@udel.edu
>> >>> Ph: 832-566-0011
>> >>
>> >> ------------------------------
>> >> Mark Robinson, PhD (Melb)
>> >> Epigenetics Laboratory, Garvan
>> >> Bioinformatics Division, WEHI
>> >> e: mrobinson@wehi.edu.au
>> >> e: m.robinson@garvan.org.au
>> >> p: +61 (0)3 9345 2628
>> >> f: +61 (0)3 9347 0852
>> >> ------------------------------
>> >>
>> >>
>> >> ______________________________________________________________________
>> >> The information in this email is confidential and inte...{{dropped:20}}
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor@r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> ------------------------------------------------------------------------
>> Davis J McCarthy
>> Research Technician
>> Bioinformatics Division
>> Walter and Eliza Hall Institute of Medical Research
>> 1G Royal Parade, Parkville, Vic 3052, Australia
>> dmccarthy@wehi.edu.au
>> http://www.wehi.edu.au
>>
>>
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________
>>
>>
>>
>> --
>> Sridhara G Kunjeti
>> PhD Candidate
>> University of Delaware
>> Department of Plant and Soil Science
>> email- sridhara@udel.edu
>> Ph: 832-566-0011
>
> ------------------------------------------------------------------------
> Davis J McCarthy
> Research Technician
> Bioinformatics Division
> Walter and Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville, Vic 3052, Australia
> dmccarthy@wehi.edu.au
> http://www.wehi.edu.au
>
>
>
>
> ______________________________________________________________________
> The information in this email is confidential and intended solely for the addressee.
> You must not disclose, forward, print or use it without the permission of the sender.
> ______________________________________________________________________
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara@udel.edu
> Ph: 832-566-0011
------------------------------------------------------------------------
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy@wehi.edu.au
http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
[[alternative HTML version deleted]]