[BioC] Identifying Processes as Upregulated or Downregulated

Joseph Shaw josph.sh at gmail.com
Fri Feb 14 01:13:31 CET 2014


Hi Jim,

Thanks so much for clearing that up for me!

Joseph

On Tue, Feb 11, 2014 at 9:04 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Joseph,
>
> The flaw in your reasoning is here:
>
>
> "Let's assume that the process represented by GO_A is such that it
> cannot be simultaneously upregulated and downregulated;"
>
> You aren't measuring a process. You are measuring gene expression. And the
> up-regulation and down-regulation of genes can have inhibitory or excitatory
> effects on a particular process.
>
> In addition, GO terms aren't even necessarily related to a single process.
> Instead, we use them as a stand-in for the underlying pathways that we hope
> to measure (but don't really know much about). If we had better pathway
> information we wouldn't even be bothering with GO terms at all.
>
> So you can certainly contrive a situation where you would only want to
> consider up-regulated genes for a particular GO term, but that situation is
> unlikely to hold in general. And when you are doing a multiple
> hypergeometric tests, using all the GO terms in your universe, it is not IMO
> a good idea to make very strong assumptions, especially if you don't need to
> do so.
>
> Best,
>
> Jim
>
>
>
>
> On Tuesday, February 11, 2014 3:36:13 PM, Joseph Shaw wrote:
>>
>> Hi Jim,
>>
>> Thanks for your reply.
>>
>> My worry, originally, was that that failure to differentiate between
>> upregulated and downregulated processes would lead to spurious
>> results.
>>
>> Let's create another scenario. Assume we have a group of genes
>> identified as upregulated and another group of genes identified as
>> downregulated. Furthermore, assume two subsets: one belonging to the
>> upregulated group and one belonging to the downregulated group. Each
>> subset is associated with several GO terms including one GO term which
>> is common to both subsets - let's call this common term GO_A.
>> Now, it may be the case that, individually, when tested against a
>> defined gene universe, neither subset yields statistically significant
>> results for GO_A, but combining the aforementioned subsets and testing
>> against a gene universe does, in fact, yield a statistically
>> significant result for GO_1.
>> Let's assume that the process represented by GO_A is such that it
>> cannot be simultaneously upregulated and downregulated; if this is the
>> case, wouldn't it be incorrect to combine the upregulated and
>> downregulated gene lists?
>>
>> Let's return to the example provided in your previous mail.
>> My understanding of the GO DAG is far from exhaustive, so it's very
>> possible that I'm wrong, but, given that the GO terms become more
>> specific as we move towards leaf nodes, would we eventually arrive at
>> a terms representative of negative regulation of programmed cell death
>> and positive regulation of programmed cell death?
>> If this is the case, assuming there was a sufficient amount of genes
>> identified as differentially expressed for both enhancer (identified
>> as upregulated in our experiment) and preventer (identified as
>> downregulated in our experiment) genes so as to yield statistically
>> significant results for separate tests. Would it be incorrect to
>> conclude that negative regulation of preventers of programmed cell
>> death and positive regulation of enhancers of programmed cell death
>> have both been shown to be statistically significant significant? It
>> seems to me that both these results are compatible.
>>
>> Joseph
>>
>> On Tue, Feb 11, 2014 at 2:00 PM, James W. MacDonald <jmacdon at uw.edu>
>> wrote:
>>>
>>> Hi Joseph,
>>>
>>> I think you are making a simplifying assumption that isn't helpful. In
>>> other
>>> words, you are assuming that up-regulation of a set of genes means
>>> something
>>> different than down-regulation, or a mixture thereof. But this flies in
>>> the
>>> face of much that we know about biological processes.
>>>
>>> As an example, say we have a set of genes with 'programmed cell death' as
>>> their GO term. And further assume that some of these genes enhance this
>>> process, and some prevent the process. Now if most of the enhancers are
>>> up-regulated, and most of the 'preventers' are down-regulated, are you
>>> prepared to say these genes should be tested separately because the
>>> up-regulated genes are involved with a different process than the
>>> down-regulated genes?
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I am in the process of performing some ontological analysis with
>>>> GOstats. Given that GOstats doesn't require any information on
>>>> relative increases or decreases in expression for its hypergeometric
>>>> testing procedure, am I correct in assuming that it does not
>>>> differentiate between upregulated and downregulated genes?
>>>>
>>>> If this is the case then providing a list of differentially expressed
>>>> genes (both upregulated and downregulated) to the testing procedure
>>>> will result in ontology results where upregulation and downregulation
>>>> may be confounded.
>>>> In other words, combining upregulated and downregulated genes and
>>>> comparing the resulting list to the gene universe will enable the
>>>> testing procedure to identify regulated ontological processes, but it
>>>> won't be able to identify whether the processes are upregulated or
>>>> downregulated. In fact, given that there is no distinction provided as
>>>> input, it may even be both.
>>>>
>>>> To me, it seems that in order to prevent this from happening two
>>>> separate testing procedures should be performed: one comparing
>>>> upregulated genes to the gene universe and one comparing downregulated
>>>> genes to the gene universe. Is this approach advisable? Is there a
>>>> correct protocol which addresses the above issue?
>>>>
>>>> Joseph
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099



More information about the Bioconductor mailing list