[BioC] 2 different factorial analysis codes in LIMMA give different logFC but same values for other components of TOPTABLE

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jun 26 01:07:31 CEST 2012


Dear Miriam,

It isn't necessary to send emails to my personal email address.  You've 
already asked the same question on the Bioconductor mailing list, so I've 
got your question four times now.  If you must refer to me personally, 
could you please spell my name correctly?

The logFC you get from a contrast depends on how you scale the contrast. 
A contrast of c(-2,1,1) and a contrast of c(-1,0.5,0.5) are entirely 
equivalent from the point of view of testing the null hypothesis of no 
change, but they will give contrast values that differ by a factor of two. 
Hence the logFC in limma will change by a factor of two, while the 
p-values and everything else will stay the same.

Similarly, defining a contrast by (B+C)/2-A or by B+C-2*A would be 
equivalent except for a factor of two.  They would give the same p-value 
but different logFC.

Obviously if you change the logFC but use the same fold change cutoff when 
assessing differential expression, then you will change the number of 
genes that you define as differentially expressed.  It is not apparent to 
me that you can choose FC=1.5 as a meaningful cutoff regardless of the 
meaning or definition of the contrast.

If you are not very familiar with contrasts, then just use the model A 
approach, which is clear and explicit and obviously correct.  That's why I 
recommend it in the User's Guide!

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Mon, 25 Jun 2012, Garcia Orellana,Miriam wrote:

> Dear Dr. Smith and all:
I am sorry to bother you with this matter, my understanting of the microarray anylisis is really basic and I am having hard and long time to finish the analysis of my data. Now, I can't figure out what is happening with the two models I am applying to evaluate my data, because each one of them on the TOPTABLE option when requesting the values for all the filtered genes ( 8026). Both models, (A and B) for each of my 5 contrasts are giving me the top table with the same numerical values for: AveExp, t, Pvalue, adjPvalue and B. However the logFC for contrasts 1, 2 and 3 in model B is exactly half of that in model A, while the logFC for contrasts 4 and 5 in model B is exactly one fourth of that one in model A.
Because this difference in logFC, I am getting different numbers of differentially expresed genes when using a cut off of adjPvalue lower or equal to 0.05 and a rawlogFC greater or equal to 1.5 for example for contrast 3(MR effect) it gives me 47 up- and 84 down-regulated genes with model A, while with model B it gives me only 18 up- and only 2 down-regulated genes under same cut-offs. How that can be possible if all other values are the same? and so what should I follow?

Briefly me data is a factorial design of 3 dam diets (DD: CTL, SFA, EFA) and 2 milk replacers (MR: LLA, HLA), I have three replicates for each of the interaction factors, then a total of 18 arrays. The data was filtered for informative/noninformative probes and plotted for array quality. So from a initial of 24118 bovine probes I endup with 8026 probes. My interest is to compare:
1.       Feeding FAT at prepartum= (SFA +EFA) vs CTL, with CTL as ref
2.       Feeding EFA prepartum = EFA vs SFA, with SFA as ref
3.       Feeding MR to calves= HLA vs LLA, with LLA as reference
4.       Interaction of feeding FAT by MR: (SFA +EFA) vs CTL by MR, with (SFA+EFA) vs CTL by LLA as ref
5.       Interaction of feeding EFA by MR:  EFA vs SFA by MR, with EFA vs SFA by LLA as ref

MODEL A (I created that with the guide of the LIMMA user guide for a factorial design:
TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".")
TS
TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA"))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(eset2, design, method="robust", maxit=1000)
efit <- eBayes(fit)
#Contrast results
MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2,
                                 FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2,
                                 MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3,
                                 FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA),
                                 FAbyMR=( EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA),
                                 levels=design)
fitMat<-contrasts.fit(fit,MatContrast)
Contrast.eBayes=eBayes(fitMat)

MODEL B (this model was kindly provided by Dr G. Smith):
DD <-factor(phenoDie$DD, levels = c("Ctl", "SFA", "EFA"))
MR <-factor(phenoDie$MR, levels = c("LLA", "HLA"))
contrasts (DD) <- cbind (SFAEFAvsCtl=c(-2,1,1),EFAvsSFA=c(0,-1,1))
contrasts (MR) <- c(-1,1)
design <-model.matrix (~DD*MR)
design
fit <- lmFit (eset2, design, method="robust",maxit=1000)
efit <- eBayes(fit)

Thanks so much  in advance,
Miriam

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list