[BioC] testing triangular setup for finding instances of "all are different"

Wolfgang Raffelsberger wraff at igbmc.fr
Mon Oct 12 13:45:02 CEST 2009

Dear List,

I'm trying to find out the best procedure to treat the following 
"triangular" experimental plan.
Three groups of samples (prepared as triplicates and run on 3+3+3=9 
microarrays) are to be compared in the way where those elements/genes 
that are different in each of the three groups should be identified (ie, 
(av1 != av2) & (av1 != av3) & (av2 != av3) )

If I run an ANOVA the F value won't tell me if one or two groups are 
different.  From the literature I see that people then like to resolves 
this again as multiple pair-wise comparisons to find out which 
group-means are actually different ....
Of course the simples way might be to run three (independent) pair-wise 
comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). 
However, when looking at the resulting p-values I'm not taking in 
account the extra multiplicity of testing. (Of course, before testing 
I'd check the data by QC and run some filtering of the data). A 
work-around might be to append all p-values to a 3 * length(genes) 
vector that could be used to estimate FDR or local fdr and finally 
search elements/genes where all FDRs are very low (I suppose I'd have to 
use the max(FDR) for a given element/gene)
On this list I've seen a post for a somehow similar (?) case ("Re: 
[BioC] result of linear model", 12 june 2009) suggesting to either look 
at a "p.value/ratio threshold in limma", but I don't think this would 
address my questions satisfyingly. Looking at the ratio of only 2 
pair-wise comparisons it might happen that (av1 != av2) and (av1 != 
av3), but if (av2 = av3) the conclusion that all three are different may 
be wrong ...

So, I'm asking myself if there are more direct and elegant approaches to 
this problem. I remember that in related fields a constellation called 
"trio" is somehow popular, and I'm sure others may have already thought 
about solutions for such a setup.
Finally, the questions remains if it is possible to develop one (single) 
FDR value for estimating that a given gene might be different in each of 
the three groups.

Any hints/suggestions ?

Thank's in advance,
Wolfgang Raffelsberger

# Here an example to illustrate ...
 dat <- matrix(runif(180),nc=9)
 rownames(dat) <- paste("g",1:nrow(dat),sep="_")
 dat[1,] <- dat[1,] + rep(1:3,each=3)       # true positive
 dat[2,] <- dat[2,] + 2*rep(1:3,each=3)     # another true positive

 groups <- gl(3,3,labels=letters[1:3])      # organize the data in 3 groups

# basic testing in limma (emprical Bayes)
 (design1 <- model.matrix( ~ -1 + groups ))
 (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
 fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
   topTable(fit1, coef=1,number=4)        # the best of 1st pairwise 

 # append p-values for correcting them all together and then rearrange 
in origial setup
 combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), 
nc=3)   # or your favorite FDR estimation method
   rownames(combFDR) <- rownames(dat)
 combFDR[ order(apply(combFDR,1,min))[1:10],]        # the best FDR results
 # and then the 'old' question of finding/defining a suitable threshold 
for the FDR values remains to be addressed ...

 # plot the best one ...
 stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups)

# for completeness :
 >  sessionInfo()
R version 2.9.1 (2009-06-26)


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] lattice_0.17-25 limma_2.18.2  

loaded via a namespace (and not attached):
[1] grid_2.9.1  tools_2.9.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
wolfgang.raffelsberger (at) igbmc.fr

More information about the Bioconductor mailing list