[BioC] 回复: 转发: Statistical approach to compare differentiall expressed gene lists

Robert Gentleman rgentlem at fhcrc.org
Tue Dec 29 19:10:54 CET 2009


Hi Quinghua,

 That is not really the way to approach the problem.  You should
consult a local statistician or biostatistician who can help you
set up an appropriate model to test for sex differences (it is
relatively straight forward and could easily be done in limma or just
about any other piece of sensible software).

 A few more comments below.

On Tue, Dec 29, 2009 at 8:49 AM, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
> Dear Qinghua,
>
> I am not sure if I would call those differences "very impressive". As your
> samples have different numbers of X and Y chromosomes, I would definitely
> expect many of them to be differentially expressed. After all, no genes on Y
> should be expressed on any of the females, right?

 Not true - there are autosomal regions that are essentially
duplicated (look at just about any annotation package in Bioc for
genes that map to two different chromosomes) between X and Y.  Also, X
inactivation makes life reasonable difficult when working with genes
on  the X chromosome.  One would need to develop some sort of sensible
model (almost surely data driven in the first instance).

 best wishes
    Robert

>
> The fact that you can trivially predict between them should suggest that
> what you are doing is not difficult at all.
>
> As Wolfgang is saying, you need another criterion to define what is a "good
> list". No statistical tests are going to tell you that one list is better
> than the other.
>
> This being said, one of your list is larger, so it is likely that it
> contains more of the differences between your groups. On the other hand, it
> could be giving you more false positives. You could look at the extra genes
> and see if they make sense. In this case, if a lot of the extra genes were
> on X and Y chromosomes, they are likely truly differentially expressed.
>
> Keep in mind that you have a large overlap between the lists, so it will be
> more difficult to choose between them but it also matters much less which
> one you choose.
>
> It would be very convenient if there was a simple test that would tell us
> which method is best for an analysis, but generally no such method exist.
>
> Francois
>
> On 12/29/2009 04:31 AM, qinghua xu wrote:
>>
>> Dear Wolfgang,
>> Â
>> It is really nice and surprise to have your attention! Thank you!
>> Â
>> I am sorry that the question was too vague. The detailed figure is that we
>> would like to study the gene expression profiling in human peripheral blood
>> and identify DEGs (differential expressed genes) between male and female. As
>> I mentioned in my previous email, the raw data were preprocessed in two
>> approaches: one is simply by RMA and the other, after RMA, the expression
>> data were further adjusted by ComBat Â
>> (http://statistics.byu.edu/johnson/ComBat/) to removal potential batch
>> effects. The dataset was relatively small including 12 Male and 12 Female.
>> At the end, we got two DEG lists by SAM at FDR=0.05. The basic idea is to
>> show by removing potential batch effects, we are capable to extract more
>> information from gene expression data representing the difference between
>> male and female in peripheral blood. On the other hand, we also would like
>> to check whether the additional batch effect adjustment will introduce
>> artificial DEGs.
>> Â
>> Based on the preliminary result, we observe that the difference between
>> male and female in peripheral blood are very impressive, especially for (x,
>> y) chromosome specific genes. Hence, when plotted ROC curves for both
>> methods, both DEG lists easily reached the maximum AUC=1. And the same
>> situation for hierarchical clustering heatmap, both DEG lists achieved
>> perfect discrimination.
>> Â
>> Thanks again!
>> Â
>> Qinghua
>>
>>
>>
>> ________________________________
>> å ‘ä»¶äººï¼š Wolfgang Huber<whuber at embl.de>
>>
>> 抄 é€ ï¼š bioconductor<bioconductor at stat.math.ethz.ch>;
>> qinghua.xu at as.biomerieux.com
>> å ‘é€ æ—¥æœŸï¼š 2009/12/28 (周一) 4:56:18 ä¸‹å ˆ
>> 主 题: Re: [BioC] è½¬å ‘ï¼š Statistical approach to compare
>> differentiall expressed gene lists
>>
>> Dear Qinghua
>>
>> I am afraid your question may be too vague. You will need to define more
>> precisely what you mean by "better". Then, it should be straightforward
>> to compute a quantitative criterion. It wouldn't be wise to wait for
>> someone else to define what is "better" for you.
>>
>> Also, for any analysis method I know of, gene lists depend in a trivial
>> manner on a cut-off (e.g. for p-value, score...), and if you want to do
>> something more meaningful than exegesis of someone's cut-off choice,
>> than I'd suggest to plot ROC curves for both methods, using a reference
>> set of genes that is enriched for "truely differentially expressed".
>>
>> Best wishes
>> Â Â Â  Wolfgang
>>
>>
>>> Dear all,
>>>
>>> I have identified two lists of differential expressed gene from the
>>> same expression data but treated with different normalisation
>>> methods. List A contains 995 genes and list B contains 2400 genes.
>>> More than nine hundreds genes are overlapped between two lists,
>>> namely most of genes in list A are also included in list B. The idea
>>> is to check whether list B is better than list A.
>>>
>>> In addition to visualisation approach (like hierarchical clustering
>>> heatmap) or biological interpretations,  I am wondering is there any
>>> other statistical approach available to compare two differential
>>> expressed gene lists?
>>>
>>> I would appreciate any advice, or pointers to any references for
>>> this!
>>>
>>> Bests, Qinghua
>>>
>>>
>>>
>>> ___________________________________________________________ 好玩贺å
>>> ¡ç­‰ä½ å ‘ï¼Œé‚®
>>> ç®±è´ºå ¡å…¨æ–°ä¸Šçº¿ï¼
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________ Bioconductor mailing
>>> list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________ Bioconductor mailing
>>> list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Robert Gentleman
rgentlem at gmail.com



More information about the Bioconductor mailing list