[BioC] which genes to choose for being truly differentially expressed from a long list
James W. MacDonald
jmacdon at uw.edu
Tue Dec 3 17:15:48 CET 2013
Hi EJ,
This is the time to revisit the original hypothesis you are testing, or
if you are trying to generate hypotheses, then what hypotheses you
would find interesting.
In general, a set of differentially expressed genes isn't particularly
compelling, especially when you start to think about Biological
relevance and measurement errors. For example, it's pretty easy to come
up with a single gene that appears to be differentially expressed, but
is in fact a false positive. And what does a single differentially
expressed gene (if a true positive) mean in the greater Biological
context of your experiment?
So univariate statistics aren't that useful IMO in this context, and
trying to reduce your set of genes to a tractable number that you can
eyeballometrically test for 'interestingness' is likely not the way to
go. In other words, taking a list of genes and scanning them to see if
you can find particular genes that you already know are implicated in
the process you are examining is not likely to be a useful exercise.
Instead, you might re-consider the rationale for doing this experiment
and see if there is something you can do to further that cause. As an
example, perhaps you are trying to find pathways that are perturbed by
some treatment. There are any number of ways to try to tease that sort
of information out; you can do GO hypergeometric tests (or KEGG tests
if you prefer). Or you could look for interesting gene sets at the
Broad, and do GSEA against those. Or maybe you want to mine the data
for interesting relationships using something like WGCNA (a google
search will get you there). Note that WGCNA is particularly useful if
you have other (preferably continuous) phenotypes.
Given that you apparently have a big signal here, I would recommend
trying to use that signal rather than chopping away at your data simply
to reduce the dimensionality.
Best,
Jim
On Tuesday, December 03, 2013 10:39:05 AM, Ekta Jain wrote:
> Thank you James. I actually meant to ask that if I have a list of 500
> genes and they all have fold change values in the range of 2-3, If i
> condense the list based on cut off value of say 2.8 fold change I
> still have a list of 200 some genes. How could I eliminate other genes
> to include more parameters in the threshold to define them as showing
> differential expression. I am not sure if there is any such thing but
> I suppose there might some definition which helps to identify genes as
> a perfect candidate for being differentially expressed.
>
> Appreciate your help,
>
> EJ
>
>
> On 3 December 2013 20:42, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi EJ,
>
>
> On Tuesday, December 03, 2013 10:04:01 AM, EJ [guest] wrote:
>
>
> Dear All,
> I used LIMMA for a dataset on Human plus 2.0 arrayto get fold
> change values for differentially expressed genes. I have a
> long list of 500 some genes with fold changes > 2 from the
> topTable function. How can I select genes which are most
> differentially expressed from this list?
>
>
> You will have to first define what you mean by 'most
> differentially expressed'. If you mean 'largest fold change' then
> please see ?topTable, specifically the sort.by <http://sort.by>
> argument.
>
> Best,
>
> Jim
>
>
>
> Thank you,
> EJ
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
> [5] LC_TIME=English_India.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> methods
> [8] base
>
> other attached packages:
> [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1
> BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 BiocInstaller_1.10.3
> preprocessCore_1.22.0
> [4] zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org
> <http://bioconductor.org>.
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list