[BioC] testing triangular setup for finding instances of "all are different"

Wed Oct 14 00:23:46 CEST 2009

Dear Wolfgang,

Your triangular problem is called intersection-union testing (IUT) in the 
statistics literature.

decideTests(method="nestedF") in limma (just add the extra argument to 
Jim's example below) was designed to give a working man's solution to this 
problem.

There are more formal solutions in the statistics literature, but none of 
them gives a good FDR estimate.  Most IUT tests are very conservative. 
It's a hard problem.

Best wishes
Gordon

> Date: Mon, 12 Oct 2009 11:28:11 -0400
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] testing triangular setup for finding instances of
> 	"all are	different"
> To: wraff at igbmc.fr
> Cc: Bioconductor <bioconductor at stat.math.ethz.ch>
>
> Hi Wolfgang,
>
> Maybe I am missing your point entirely, but doesn't decideTests(fit1) do
> exactly what you want? You would have to modify your contrasts matrix
> (as the third column isn't a contrast anyway) to something like this:
>
> > cont <- cbind(contr.matr1[,-3], c(1,0,-1))
> > colnames(cont) <- c("a-b","b-c","a-c")
> > cont
>         a-b b-c a-c
> groupsa   1   0   1
> groupsb  -1   1   0
> groupsc   0  -1  -1
>
> > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont))
> > rslt <- decideTests(fit1)
> > vennDiagram(rslt)
>
> Best,
>
> Jim
>
>
>
> Wolfgang Raffelsberger wrote:
>> Dear List,
>>
>> I'm trying to find out the best procedure to treat the following
>> "triangular" experimental plan.
>> Three groups of samples (prepared as triplicates and run on 3+3+3=9
>> microarrays) are to be compared in the way where those elements/genes
>> that are different in each of the three groups should be identified (ie,
>> (av1 != av2) & (av1 != av3) & (av2 != av3) )
>>
>> If I run an ANOVA the F value won't tell me if one or two groups are
>> different.  From the literature I see that people then like to resolves
>> this again as multiple pair-wise comparisons to find out which
>> group-means are actually different ....
>> Of course the simples way might be to run three (independent) pair-wise
>> comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However,
>> when looking at the resulting p-values I'm not taking in account the
>> extra multiplicity of testing. (Of course, before testing I'd check the
>> data by QC and run some filtering of the data). A work-around might be
>> to append all p-values to a 3 * length(genes) vector that could be used
>> to estimate FDR or local fdr and finally search elements/genes where all
>> FDRs are very low (I suppose I'd have to use the max(FDR) for a given
>> element/gene)
>> On this list I've seen a post for a somehow similar (?) case ("Re:
>> [BioC] result of linear model", 12 june 2009) suggesting to either look
>> at a "p.value/ratio threshold in limma", but I don't think this would
>> address my questions satisfyingly. Looking at the ratio of only 2
>> pair-wise comparisons it might happen that (av1 != av2) and (av1 !=
>> av3), but if (av2 = av3) the conclusion that all three are different may
>> be wrong ...
>>
>> So, I'm asking myself if there are more direct and elegant approaches to
>> this problem. I remember that in related fields a constellation called
>> "trio" is somehow popular, and I'm sure others may have already thought
>> about solutions for such a setup.
>> Finally, the questions remains if it is possible to develop one (single)
>> FDR value for estimating that a given gene might be different in each of
>> the three groups.
>>
>> Any hints/suggestions ?
>>
>> Thank's in advance,
>> Wolfgang Raffelsberger
>>
>>
>> # Here an example to illustrate ...
>> set.seed(2009)
>> dat <- matrix(runif(180),nc=9)
>> rownames(dat) <- paste("g",1:nrow(dat),sep="_")
>> dat[1,] <- dat[1,] + rep(1:3,each=3)       # true positive
>> dat[2,] <- dat[2,] + 2*rep(1:3,each=3)     # another true positive
>>
>> groups <- gl(3,3,labels=letters[1:3])      # organize the data in 3 groups
>>
>> # basic testing in limma (emprical Bayes)
>> require(limma)
>> (design1 <- model.matrix( ~ -1 + groups ))
>> (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
>>   paste(colnames(design1)[2:3],collapse="-"),
>>   paste(colnames(design1)[c(1:3)],collapse="-"),
>>   levels=design1))
>> fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
>>   topTable(fit1, coef=1,number=4)        # the best of 1st pairwise
>> comparison
>>
>> # append p-values for correcting them all together and then rearrange in
>> origial setup
>> combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"),
>> nc=3)   # or your favorite FDR estimation method
>>   rownames(combFDR) <- rownames(dat)
>> combFDR[ order(apply(combFDR,1,min))[1:10],]        # the best FDR results
>> # and then the 'old' question of finding/defining a suitable threshold
>> for the FDR values remains to be addressed ...
>>
>> # plot the best one ...
>> require(lattice)
>> stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups)
>>
>>
>> # for completeness :
>> >  sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
>>
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> other attached packages:
>> [1] lattice_0.17-25 limma_2.18.2
>> loaded via a namespace (and not attached):
>> [1] grid_2.9.1  tools_2.9.1
>>
>>
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> Wolfgang Raffelsberger, PhD
>> Laboratoire de BioInformatique et G?nomique Int?gratives
>> CNRS UMR7104, IGBMC,  1 rue Laurent Fries,  67404 Illkirch  Strasbourg,
>> France
>> Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
>> wolfgang.raffelsberger (at) igbmc.fr

>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826