[BioC] Large # of significant genes with SAM

Joern Toedling toedling at ebi.ac.uk
Tue May 10 17:35:31 CEST 2005


Hi Vincent,

I imagine such large numbers of differentially expressed genes could 
arise for various reasons.
One issue could be that there are large technical or experimental 
differences between your tumour and control samples due to scanner 
settings or hybridisation protocols etc. I would check if after 
normalisation such large differences between the groups are obvious by 
using boxplots, Scatter-Plots etc. (many examples for such control 
procedures can be found on the Bioconductor website , especially on the 
pages containing material for courses and workshops). If so, you might 
think about other methods for normalisation or combining the two groups 
data in another way, if they happen to be too different.
Another reason for large differences could be that there might really be 
huge biological differences between the two groups. For instance, when 
analyzing T- versus B-lymphocytes, one usually observes large 
percentages > 20% of differentially expressed genes, since in that case 
we were comparing very different cell types with each other. However, I 
would not expect such striking differences between a tumour and the 
related physiological tissue. To check, if there are really large 
biological differences between the two groups, you could also check if 
the lists of significantly up- or down regulated genes hint to precise 
biological picture, for example by using Bioconductor's "GOstats" 
package and looking for relationships between the most significant GO nodes.
Since SAM computes a regularised t-statistic, I think, you should also 
check that the normal-distribution assumption does at least 
approximately hold. Double-checking the results might be good idea, and, 
since finding differentially expressed genes is a standard task, you 
have a large number of methods/ packages available for that. Again, you 
should check the documents at the Bioconductor website 
courses/compendiums/materials section.
You might also consider using other packages, such as "twilight", to 
obtain an estimate for the percentage of differentially expressed genes 
in your data.

Best regards,
Joern

Vincent Detours wrote:

>Dear all,
>
>Your expert opinion are most welcome on the following.
>
>I am finding using siggenes' SAM @ q<0.05 (26 samples on cDNA chips)
>that 37% of all genes are regulated with respect to patient-matched
>"normal" tissues in somme tumors not particularly known for huge
>aneuploidy. Looking at another data set from the same cancer but
>collected by another group on indepentent samples on Affy, I got 34%.
>The number seems to hold.
>
>How to interpret this? Are really 30% of the genes disturbed, even to
>a small extent, in these tumors? Could SAM do something wrong? If yes,
>how to verify it?
>
>Any advise, shared experience, references, etc. are welcome
>
>Cheers
>
>Vincent
>
>
>------------------------------------------
>Vincent Detours, Ph.D.
>IRIBHM
>Bldg C, room C.4.116
>ULB, Campus Erasme, CP602
>808 route de Lennik
>B-1070 Brussels
>Belgium
>
>Phone: +32-2-555 4220
>Fax: +32-2-555 4655
>
>E-mail: vdetours at ulb.ac.be
>
>URL: http://homepages.ulb.ac.be/~vdetours/
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>  
>



More information about the Bioconductor mailing list