[BioC] Assess inter-study consistency

Ochsner, Scott A sochsner at bcm.tmc.edu
Thu Sep 4 19:40:17 CEST 2008


Dear BioC,

I would like to use simple correlation to assess the consistency between a seven independent expression array datasets.  All datasets are on the same platform, hgu133a.

In the materials and methods section from http://cancerres.aacrjournals.org/cgi/content/full/67/21/10296#top they state, 
"To assess for consistency between the three studies, Pearson correlation was computed pair-wise between the mean values of common genes. The three studies showed significant positive pair-wise correlation."

I'm having trouble following their statement.  I don't have to worry about common genes as all of the seven studies I'm looking at are on the same platform.

I thought of doing something as below:

#eset is your standard ExpressionSet object
#treatment is a vector describing which group each array belongs to.  There are two groups, cont. and drug.

>avg<-function(eset,treatment){
+ tmp<-aggregate(t(exprs(eset)),by=list(treatment),mean)
+ rownames(tmp)<-tmp[,1]
+ t(tmp[,-1])
+ }
>groupAverage<-avg(eset,treatment)
> dim(groupAverage)
[1] 22277    14

> cor(sampleAverage)
          c.d3529   c.d3834   c.d4006   c.d4025   c.d6800   c.d8540   c.d9936   e.d3529   e.d3834   e.d4006   e.d4025   e.d6800   e.d8540
c.d3529 1.0000000 0.9659532 0.7933771 0.7498652 0.8957816 0.8874096 0.9041292 0.9917589 0.9535454 0.7964003 0.7577108 0.8889499 0.8904473
c.d3834 0.9659532 1.0000000 0.8071949 etc....


Questions:
1. Since I'm expecting most of the probe sets on these arrays to not change, shouldn't I expect high correlation even between the cont. and drug groups?  Or in other words, how informative is doing cor across all of the probe sets?

2. How might I assess the significance of these correlations.

> sessionInfo()
R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] affycoretools_1.12.0 annaffy_1.12.1       KEGG.db_2.2.0        gcrma_2.12.1         matchprobes_1.12.0   biomaRt_1.14.0      
 [7] RCurl_0.9-3          GOstats_2.6.0        Category_2.6.0       RBGL_1.16.0          GO.db_2.2.0          graph_1.18.1        
[13] limma_2.14.2         affy_1.18.1          preprocessCore_1.2.0 affyio_1.8.0         MLInterfaces_1.14.1  annotate_1.18.0     
[19] xtable_1.5-2         AnnotationDbi_1.2.1  RSQLite_0.6-8        DBI_0.2-4            rda_1.0              rpart_3.1-41        
[25] genefilter_1.20.0    survival_2.34-1      MASS_7.2-41          Biobase_2.0.1       

loaded via a namespace (and not attached):
[1] class_7.2-41    cluster_1.11.10 XML_1.95-2

Scott A. Ochsner, Ph.D.
NURSA Bioinformatics
Molecular and Cellular Biology
Baylor College of Medicine
Houston, TX. 77030
phone: 713-798-6227 



More information about the Bioconductor mailing list