[BioC] LIMMA:design (1, 2, 3, 3 ) , I got EXCITING results what could be the logic, since i have 2 replicates for 3rd group only?

Gordon K Smyth smyth at wehi.EDU.AU
Wed Apr 27 15:40:00 CEST 2005


> Date: Wed, 27 Apr 2005 10:18:33 +0100
> From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
> Subject: Re: [BioC] LIMMA : design (1, 2, 3, 3 ) , I got EXCITING
> 	results,	what could be the logic, since i have 2 replicates for 3rd
> 	group only ?
> To: Naomi Altman <naomi at stat.psu.edu>
> Cc: Bioconductor Bioconductor <bioconductor at stat.math.ethz.ch>

> Saurin, I agree with Naomi. Increase your sample size.
>
> You cannot rely on the results from your current design as the ONE
> sample per group may not be representative of the population you want to
> study. Moreover, others would not trust your result.
>
> Should a warning be added to LIMMA when users attempt to use single
> array per group ?

No, I don't it should.  There is no problem mathematically with having only one more observation
than groups, and regression programs do not normally issue warnings in this situation.  The
purpose of the software is to do the best with the data presented, not to ask the user to do
something different.

The experiment described yields 1 df for residuals.  There is actually much more than one can do
with one df for error in a microarray experiment than in an ordinary univariate experiment,
because of the possibility of pooling information across genes, and it is this pooling which I
think is surprising the original poster, Saurin Jani.

The aim of the software in this sort of situation is to do better than simply ranking the genes on
fold changes, and I think it does this.  The user is not expected to bet their house on the
correctness of the p-values, rather to have complete confidence in the results the user should
expect to have to do some independent validation.  There is a discussion at the end of

 http://bioinformatics.oupjournals.org/cgi/content/abstract/21/9/2067

on why simply models can be useful even for small experiments and in situations when the
assumptions do not hold exactly.

Gordon

> Regards, Adai
>
>
> On Tue, 2005-04-26 at 22:55 -0400, Naomi Altman wrote:
>> Significance should be based on biological replication.  If the 2 chips for
>> group 3 are technical replicates, then the variance estimate for the test
>> is probably too small.
>
>> In theory, statistical tests need only 2 replicate in a single condition,
>> as the null distribution accounts for the number of replicates.   However,
>> for this theory to hold, the normality of the samples must be pretty
>> good.  When the data are exactly normally distributed (and the assumptions
>> for limma for the distribution of variance hold) then the FDR values should
>> be pretty good, but the FNR will be poor (as you have no power).
>>
>> However, I don't think anyone believes that microarray data are normally
>> distributed.  So, I would not really trust these results, even if you have
>> a biological replicate.  Of course the 2-fold rule is even worse, so really
>> you should do more biological replication.
>>
>> --Naomi
>>
>> At 09:51 PM 4/26/2005, Saurin Jani wrote:
>> >Hi Adai,
>> >
>> >Yes, you are right. I have 4 samples :
>> >
>> >Group1 = Growth Effect for Day 1 : 1 Affy GeneChip.
>> >Group2 = Growth Effect for Day 2 : 1 Affy GeneChip.
>> >Group3 = Growth Effect for Day 3 : 2 Affy GeneChips.
>> >
>> >so, my design matrix is:
>> >design <- model.matrix(~ -1+factor(c(1,2,3,3)));
>> >
>> >LIMMA did not give any error or waring even it has 1
>> >sample per group...! ( I thought similar thing,  since
>> >it needs technical replicates per group to make a
>> >decision). The results are very interesting. I have
>> >many genes for 0.01 FDR, which is very good.
>> >
>> >Somehow,I don't understand the logic. Do you think is
>> >this a valid design? Or You think I should go by Fold
>> >Change Logic. Please, let me know.
>> >
>> >Thank you very much,
>> >Saurin
>> >
>> >
>> >
>> >
>> >
>> >--- Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
>> >wrote:
>> > > PLEASE correct me if I am wrong.
>> > >
>> > > You have a total of 4 samples that could be
>> > > classified into one of 3
>> > > groups ? How do you plan on distinguishing
>> > > biological from technical
>> > > variation ? Shouldn't limma come with some sort of
>> > > warning or error if
>> > > there are only one sample per group ?
>> > >
>> > > Regards, Adai
>> > >
>> > >
>> > >
>> > > On Tue, 2005-04-26 at 10:01 -0700, Saurin Jani
>> > > wrote:
>> > > > Hi BioC,
>> > > >
>> > > > I have 3 groups but I have only 2 replicates for
>> > > last
>> > > > group. so, group 1 and 2 has only one Affy CEL
>> > > file. I
>> > > > Did..LIMMA as below and I got some Exciting
>> > > results:
>> > > >
>> > > > #----------------------------------
>> > > > design <- model.matrix(~ -1+factor(c(1,2,3,3)));
>> > > > colnames(design) <-  c("g1","g2","g3");
>> > > > fit <- lmFit(myRMA,design);
>> > > >
>> > > > contrast.matrix <-
>> > > > makeContrasts(g1-g2,g1-g3,g2-g3,levels = design);
>> > > >
>> > > > fit2 <- contrasts.fit(fit,contrast.matrix);
>> > > > fit2 <- eBayes(fit2);
>> > > >
>> > > > results <-
>> > > > decideTests(fit2,adjust="fdr",p.value=0.01);
>> > > >
>> > > > myGenes <- geneNames(myRMA);
>> > > > i <- apply(results,c(1,2),all);
>> > > >
>> > > > a <- i[,1];
>> > > > b <- i[,2];
>> > > > c <- i[,3];
>> > > > tempgenes1 <- myGenes[a];
>> > > > tempgenes2 <- myGenes[b];
>> > > > tempgenes3 <- myGenes[c];
>> > > >
>> > > > tempall <- c(tempgenes1,tempgenes2,tempgenes3);
>> > > > myDEGenes <- tempall;
>> > > >
>> > > > esetSub2X <- MatrixRMA[myDEGenes,];
>> > > > esetSub2 <- new("exprSet",exprs = esetSub2X);
>> > > > pData(esetSub2) <- pData(myRMA);
>> > > > heatmap(esetSub2X);
>> > > > #----------------------------------
>> > > >
>> > > > I got EXCITING results, what could be the
>> > > logic,since
>> > > > i have 2 replicates for 3rd group only ?
>> > > >
>> > > > Could anyone point me out ?
>> > > >
>> > > > I highly appreciate your help , Thank you in
>> > > advance.
>> > > >
>> > > > Thank you,
>> > > > Saurin
>> > > >
>> > > > _______________________________________________
>> > > > Bioconductor mailing list
>> > > > Bioconductor at stat.math.ethz.ch
>> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > >
>> > >
>> > >
>> >
>> >_______________________________________________
>> >Bioconductor mailing list
>> >Bioconductor at stat.math.ethz.ch
>> >https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Bioinformatics Consulting Center
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 (Statistics)
>> University Park, PA 16802-2111



More information about the Bioconductor mailing list