Hello Davis,
Thank you very much for your comments. I read the documentation, but could
not catch this. But with your comments, now it is clear to me.

Many thanks,
Sridhara


On Tue, May 31, 2011 at 7:26 PM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote:

> Hi Sridhara
>
> The function exactTest() has an argument 'pair'. This argument can be used
> to define the two groups you wish to compare. The order in which you put in
> the groups in the pair will determine the direction of the log-fold change.
>
> For instance if you called
> > exactTest(d1, common.disp=FALSE, pair=c("con3","dca3"))
> then you will get the comparison dca3 - con3. If you put in
> pair=c("dca3","con3") then you will get the comparison con3-dca3.
>
> This is a much better approach to making different comparisons between
> groups than renaming the columns of your count matrix.
>
> In principle, ?exactTest should have been sufficient to answer this
> question for you. It's considered good form to read the documentation
> thoroughly before posting to the list.
>
> plotSmear() operates on DGEExact objects (output from exactTest), so this
> avoids the final issue you're worried about entirely --- direction of the
> log-fold change on the plot will match that in your exact test. Again,
> ?plotSmear has details.
>
> Cheers
> Davis
>
>
> On Jun 1, 2011, at 4:35 AM, Sridhara Gupta Kunjeti wrote:
>
> Hello Davis,
> Yes, this helped me to solve the problem. On the other hand, I have a
> different kind of question, which is related to the exactTest. First two
> columns in my inputs files are the counts for the control group.
> =============================================
> example 1: Textfile1
> Gene     con3-1   con3-2    dca-1    dca-2.
> when I run the exactTest
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups:  dca3 - con3
>
> So, if the logFC is positive, it means it is up-regulated in dca3, and
> these dots are plotted above '0' in the plotsmear.
>
> ================================================
> example 2:
> When I swap the columns
> Gene     dca-1    dca-2       con3-1      con3-2
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups:  con3 - dca3
>
> Here if the logFC is negative, it means it is up-regulated in dca3, and
> these are plotted below '0' in the plotSmear.
>
> Here the bottom line is if I swap the columns, when I run the exactTest, it
> changes the sequence in pairing. In other words pairs* change* from dca3 -
> con3 to con3 - dca3.
>
> This worked absolutely fine with 6 pairs. But for four pairs, even when I
> swap the columns in the input data, in the exactTest the sequence is not
> changing. i.e., con3 - c33 *does not change* to c33-con3
>
> My worry is if I look at logFC values, for some of the pair if the values
> is "+", then it is up-regulated in the treatment and for some it is "-".  I
> am assuming this is going to be a problem when I generate plotSmear. I mean
> inconsistent.
>
> Any help in generating same logFC values (positive for upregualtion in
> treatment) will be appreciated.
>
> Thanks,
> Sridhara
>
>
>
> On Mon, May 30, 2011 at 2:33 AM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote:
>
>> Hi Sridhara
>>
>> I'm not sure I completely follow what you're saying about the FDRs being
>> 0.5, 0.6 etc. Can you show us a top table? Output of topTags(). Actually it
>> would be good to see all of your edgeR function calls to get a better idea
>> of how you're carrying out your analysis. In principle I don't think that
>> "0" in the data will have any adverse effects on your analysis, so I'm not
>> really sure what the results are that you're trying to describe.
>>
>> If you are in an R 2.13 session and enter the commands:
>>
>> source("http://www.bioconductor.org/biocLite.R")
>> biocLite("edgeR")
>>
>> then edgeR version 2.2.5 will be installed on your system. I would
>> recommend following the latest version of the edgeR User's Guide, which was
>> released with edgeR 2.2.x. You can get it from edgeR's Bioconductor page:
>> http://www.bioconductor.org/packages/2.8/bioc/html/edgeR.html
>>
>> Hope that helps.
>>
>> Cheers
>> Davis
>>
>>
>> On May 27, 2011, at 9:36 PM, Sridhara Gupta Kunjeti wrote:
>>
>> Hello Davis,
>> Thank you very much for your email. After looking at one of my
>> comparisons, it makes total sense about the p-value. But, I did notice that
>> out of 10827 genes, most of them (10820) had an *FDR* of 1 and rest
>> others had an FDR of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering if "0"
>> in the data will cause this FDR?
>>
>> I will also install latest version of R 2.13 and also the edgeR. Could you
>> please let me know the latest version of edgeR that is available for me to
>> download? I am assuming I can still follow the same manual (from version
>> 2.0.3) for the new version of edgeR.
>>
>> Many thanks!
>> Sridhara
>>
>>
>> On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy <dmccarthy@wehi.edu.au>wrote:
>>
>>> Hi Sridhara
>>>
>>> I do not think it is genes with all zero counts for group A and group C
>>> are causing the results you see.
>>>
>>> I just tested this on a dataset with 9 groups, and comparing two groups,
>>> A and B, with 285 genes with all zero counts in groups A and B yielded
>>> "expected" p-values and FDRs. Therefore I do not think that your p-values
>>> all being 1 is driven by these all-zero genes.
>>>
>>> Is there truly very little difference in expression between groups A and
>>> C relative to biological variability in your data? You could have a look at
>>> the counts (raw, normalized or counts per million) for the top-ranked (even
>>> if not significant) genes for your group A - group C comparison.
>>>
>>> If you see little difference in expression between the groups for the top
>>> genes then you may have no differential expression between these groups. If,
>>> on the other hand, there does look to be large differences in expression
>>> between the groups then you may have found a bug in the p-values that are
>>> being output and we can go ahead and try to fix the issue.
>>>
>>> I notice that you are using R 2.12 and edgeR version 2.0.3. I would
>>> recommend updating to R 2.13 and the latest release of edgeR---there have
>>> been many improvements made to the package since version 2.0.3 and any bug
>>> fixes (if required) will roll out to the current release and devel versions,
>>> not legacy versions of the package.
>>>
>>> Cheers
>>> Davis
>>>
>>>
>>>
>>> On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote:
>>>
>>> > Hello Mark,
>>> > Thank you very much for you email. It greatly helped me to export the
>>> FDR,
>>> > p-value, logFC and logConc into csv format.
>>> > I have one real quick question, this is more of statistical question.
>>> > After exporting the FDR, I started analyzing pair by pair. In the below
>>> > example, what I noticed is when comparing the group A - B, I got
>>> p-value and
>>> > FDR that make sense. But, when I checked for the group A- group C
>>> > comparision. all the 10,000 genes had FDR and p-value of 1, then I
>>> counted
>>> > the number of genes that had "0" in both the groups for both the
>>> replicates,
>>> > it turned out to be about 400 genes. So, my question is why the other
>>> genes
>>> > (9600) had FDR and p-value of "1". Do you think the 400 genes with "0"
>>> > counts would affect the analysis? Do I need to delete these 400 genes
>>> for
>>> > the pair (gp A - gp C) comparison and then run and edgeR analysis
>>> > individually?
>>> >
>>> >                         groupA                Group B
>>> Group
>>> > C
>>> > Genes              A1     A2                B1    B2                C1
>>>    C2
>>> >    1                   0      0                   11     12
>>>   0
>>> >       0
>>> >    2                   120   102               45     38
>>>  30
>>> >   40
>>> >
>>> >
>>> > Any help or comments will be appreciated.
>>> >
>>> > Many thanks!
>>> > Sridhara
>>> >
>>> >
>>> > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson <mrobinson@wehi.edu.au
>>> >wrote:
>>> >
>>> >> Hi Sridhara,
>>> >>
>>> >> The problem here is that the output of topTags() (your 'fdr06') is not
>>> a
>>> >> data.frame or matrix, which is what write.table() works best on.
>>> Instead,
>>> >> try:
>>> >>
>>> >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method = "BH",
>>> >> sort.by="p.value")
>>> >> write.table(fdr06$table, file = "FDR06.csv", sep=",")
>>> >>
>>> >> Cheers,
>>> >> Mark
>>> >>
>>> >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote:
>>> >>
>>> >>> Hello Mark,
>>> >>> Thanks for your email. I have one quick question. Is it possible to
>>> >> export all the 10,427 genes after passing exactTest()? what argument
>>> do I
>>> >> need to use to do that? Basically I wanted the complete list of genes
>>> with
>>> >> the following info:
>>> >>>> topTags(de06.tgw, n = 10, adjust.method="BH", sort.by="p.value")
>>> >>> Comparison of groups: T6-P18
>>> >>>
>>> >> logConc      logFC       PValue          FDR
>>> >>> PITG_08841 | Pi conserved hypothetical protein (129 nt)
>>> >> -28.79463  42.442850 1.032735e-11 1.076833e-07
>>> >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt)
>>> >> -12.93992   9.148329 1.288618e-09 6.193586e-06
>>> >>>
>>> >>> If I use the following argument, it is showing an error message.
>>> >>>
>>> >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH", sort.by
>>> >> ="p.value")
>>> >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA,
>>> >> qmethod="double")
>>> >>> Error in data.frame(table = list(logConc = c(-28.7946, -12.93992, :
>>> >> arguments imply differing number of rows: 10427, 1, 2
>>> >>>
>>> >>> If I do the same with n = 10426, it is executinig without any error.
>>> >> Except that I am missing one row.
>>> >>>
>>> >>> Any suggetions on how to export all the columns for all the rows will
>>> be
>>> >> a great help.
>>> >>>
>>> >>> Many thanks!
>>> >>> Sridhara
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson <
>>> mrobinson@wehi.edu.au>
>>> >> wrote:
>>> >>> Hi Sridhara,
>>> >>>
>>> >>> If you haven't already, you might have a solid read of the edgeR
>>> user's
>>> >> guide, it has answers to some of your questions.
>>> >>>
>>> >>>
>>> >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>> I have used edgeR for DGE analysis and I have few questions
>>> regarding
>>> >> the
>>> >>>> model and comparisions.
>>> >>>>
>>> >>>> 1) What kind of statistical model is taken into account to analyze
>>> >> treatment
>>> >>>> structure and conduct analysis of variance?
>>> >>>
>>> >>> For the example you show below (a 2-group comparison), the 'Negative
>>> >> binomial models' Section in the user's guide covers this.  Of course,
>>> the
>>> >> package has facility for more complicated "treatment structure"
>>> through
>>> >> generalized linear models (See the 'Experiment with multiple factors'
>>> >> Section, for example).
>>> >>>
>>> >>>
>>> >>>> 2) How does the edgeR correct the multiple comparisions?
>>> >>>
>>> >>> See ?topTags; its also mentioned in the user's guide.
>>> >>>
>>> >>> ----
>>> >>>    topTags(object, n=10, adjust.method="BH", sort.by="p.value")
>>> >>> ...
>>> >>> adjust.method: character string stating the method used to adjust
>>> >>>         p-values for multiple testing, passed on to Śp.adjust‚
>>> >>> ...
>>> >>> ----
>>> >>>
>>> >>>
>>> >>>> 3) I am assuming that the calculated  p-values in the output after
>>> >>>> performing the tagwiseDispersion are after adjusting for multiple
>>> >> testing.
>>> >>>> Please correct me if I am wrong? If so, what kind of multiple
>>> testing
>>> >> is
>>> >>>> taken into account?
>>> >>>
>>> >>> exactTest() doesn't do the multiple testing correction, but topTags()
>>> >> does.
>>> >>>
>>> >>> HTH,
>>> >>> Mark
>>> >>>
>>> >>>
>>> >>>>
>>> >>>> The arguments that I passed are as follows:
>>> >>>>> raw.data <- read.delim("c33_con3.txt")
>>> >>>>> raw.data.2a <- read.delim ("2c33_con3.txt")
>>> >>>>> d2a <- raw.data.2a[, 2:5]
>>> >>>>> rownames(d2a) <- raw.data.2a[,1]
>>> >>>>> group2a <- c(rep("c33", 2), rep("con3", 2))
>>> >>>>> d2a <- DGEList(counts = d2a, group = group2a)
>>> >>>>> d2a <- estimateCommonDisp(d2a)
>>> >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length = 500)
>>> >>>>> prior.n2a <- estimateSmoothing(d2a)
>>> >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE)
>>> >>>>> de2a.tgw
>>> >>>> An object of class "DGEExact"
>>> >>>> $table
>>> >>>>
>>> >>>> logConc       logFC   p.value
>>> >>>> MGG_00005 | Mo hypothetical protein (1014 nt)
>>> >>>> -16.67772  0.05248378 0.9394668
>>> >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt)
>>> >>>> -14.68066  0.36189877 0.2786389
>>> >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt)
>>> >>>> -13.50677  0.32379041 0.3759259
>>> >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472 nt)
>>> >> -14.28686
>>> >>>> -0.35747999 0.3040601
>>> >>>> MGG_00018 | Mo integral membrane protein (2504 nt)
>>> >>>> -14.56791  0.45187243 0.1701996
>>> >>>> 11452 more rows ...
>>> >>>> $comparison
>>> >>>> [1] "c33"  "con3"
>>> >>>> $genes
>>> >>>> NULL
>>> >>>>
>>> >>>>
>>> >>>>> sessionInfo()
>>> >>>> R version 2.12.1 (2010-12-16)
>>> >>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>> >>>> locale:
>>> >>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>> >>>> States.1252    LC_MONETARY=English_United States.1252
>>> >>>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>>> >>>> States.1252
>>> >>>> attached base packages:
>>> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> >>>> other attached packages:
>>> >>>> [1] edgeR_2.0.3
>>> >>>> loaded via a namespace (and not attached):
>>> >>>> [1] limma_3.6.9  tools_2.12.1
>>> >>>>
>>> >>>> I would really appreciate your comments or suggestions.
>>> >>>>
>>> >>>> Many thanks!
>>> >>>>
>>> >>>> Sridhara
>>> >>>>
>>> >>>> --
>>> >>>> Sridhara G Kunjeti
>>> >>>> PhD Candidate
>>> >>>> University of Delaware
>>> >>>> Department of Plant and Soil Science
>>> >>>> email- sridhara@udel.edu
>>> >>>> Ph: 832-566-0011
>>> >>>>
>>> >>>>      [[alternative HTML version deleted]]
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> Bioconductor mailing list
>>> >>>> Bioconductor@r-project.org
>>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >>>> Search the archives:
>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >>>
>>> >>> ------------------------------
>>> >>> Mark Robinson, PhD (Melb)
>>> >>> Epigenetics Laboratory, Garvan
>>> >>> Bioinformatics Division, WEHI
>>> >>> e: mrobinson@wehi.edu.au
>>> >>> e: m.robinson@garvan.org.au
>>> >>> p: +61 (0)3 9345 2628
>>> >>> f: +61 (0)3 9347 0852
>>> >>> ------------------------------
>>> >>>
>>> >>>
>>> >>>
>>> ______________________________________________________________________
>>> >>> The information in this email is confidential and intended solely for
>>> the
>>> >> addressee.
>>> >>> You must not disclose, forward, print or use it without the
>>> permission of
>>> >> the sender.
>>> >>>
>>> ______________________________________________________________________
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Sridhara G Kunjeti
>>> >>> PhD Candidate
>>> >>> University of Delaware
>>> >>> Department of Plant and Soil Science
>>> >>> email- sridhara@udel.edu
>>> >>> Ph: 832-566-0011
>>> >>
>>> >> ------------------------------
>>> >> Mark Robinson, PhD (Melb)
>>> >> Epigenetics Laboratory, Garvan
>>> >> Bioinformatics Division, WEHI
>>> >> e: mrobinson@wehi.edu.au
>>> >> e: m.robinson@garvan.org.au
>>> >> p: +61 (0)3 9345 2628
>>> >> f: +61 (0)3 9347 0852
>>> >> ------------------------------
>>> >>
>>> >>
>>> >> ______________________________________________________________________
>>> >> The information in this email is confidential and
>>> inte...{{dropped:20}}
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor@r-project.org
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> ------------------------------------------------------------------------
>>> Davis J McCarthy
>>> Research Technician
>>> Bioinformatics Division
>>> Walter and Eliza Hall Institute of Medical Research
>>> 1G Royal Parade, Parkville, Vic 3052, Australia
>>> dmccarthy@wehi.edu.au
>>> http://www.wehi.edu.au
>>>
>>>
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>>
>>
>>
>>
>> --
>> Sridhara G Kunjeti
>> PhD Candidate
>> University of Delaware
>> Department of Plant and Soil Science
>> email- sridhara@udel.edu
>> Ph: 832-566-0011
>>
>>
>>  ------------------------------------------------------------------------
>> Davis J McCarthy
>> Research Technician
>> Bioinformatics Division
>> Walter and Eliza Hall Institute of Medical Research
>> 1G Royal Parade, Parkville, Vic 3052, Australia
>> dmccarthy@wehi.edu.au
>> http://www.wehi.edu.au
>>
>>
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>>
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara@udel.edu
> Ph: 832-566-0011
>
>
> ------------------------------------------------------------------------
> Davis J McCarthy
> Research Technician
> Bioinformatics Division
> Walter and Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville, Vic 3052, Australia
> dmccarthy@wehi.edu.au
> http://www.wehi.edu.au
>
>
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:20}}