[BioC] Idenifying signatures across many samples?

Mon May 10 17:13:27 CEST 2004

Hello all,

I am looking to identify common expression profiles (signatures) across many different pairwise comparisons. Essentially, I have lots of diverse affymetric data sets from different tissues, but for each tissue type I have one sample that expresses my phenotype of interest, and one or more others that do not.  I am interested in identifying which mRNA transcripts are up or down regulated selectively for that phenotype (obviously, this is a broadly defined phenotype, since it is observed in several different tissue types. While there will be many genes that are tissue specific, I am hopeing that the similar tissues that don't express the phenotype will control for this). 

So far, I have simply used the affy package from bioconductor to summarize and pre-process  the data, then identified those genes whose fold change within a tissue type comparison reaches a given threshold for my phenotype of interest, and then asked which genes is this true across all tissue types (many samples have only 2 or 3 replicates- so my read of literature is p values not very useful here). 

Seems to me there must be a more statistically valid approach or one which somehow weighs degrees of correlation across all comparisons and doesn't necesssarily exclude a gene which might be strongly correlated in all but one comparison (where it might be just below a given threshold for example).

Any advice or directions to relevant papers/ approaches would be greatly apreciated.

John 

C John Luckey, MD PhD

Resident - Clinical Pathology - Brigham and Women’s Hospital

Post Doctoral Fellow – Diane Mathis/Christophe Benoist Lab - Joslin Diabetes Center

One Joslin Place, Rm. 474

Boston, MA  02215

phone: (617) 264-2783

fax:     (617) 264-2744

e-mail: john.luckey at joslin.harvard.edu