[BioC] Limma time course analysis - defining comparisons

Tue Jun 17 19:01:01 CEST 2008

Dear James and Others

I have followed your suggestion from last week, about
trying to define all contrasts in a time course
analysis, and then using the nestedF option. However,
I am afraid that I am still a bit confused. 

I run the analysis shown below, and try decideTests
both with and without method="nestedF". The rsult is
also shown below, but too summarize it is:

Without nested F:
summary(results)
#   h24-h12 h12-h4 h24-h4
#-1      87      2    237
#0    21714  22280  21266
#1      482      1    780

With nested F
summary(resultsF)
#   h24-h12 h12-h4 h24-h4
#-1      94     41    159
#0    21706  22210  21547
#1      483     32    577

I am not sure why I see such a big (relative) change
for 12hour versus 4hour. In both cases I have a direct
comparison. Acording to the help page for
classifyTestsF that is being called, it says:

"classifyTestsF uses a nested F-test approach giving
particular attention to correctly classifying genes
which have two or more significant t-statistics, i.e.,
are differential expressed under two or more
conditions. For each row of tstat, the overall
F-statistics is constructed from the t-statistics as
for FStat. At least one constrast will be classified
as significant if and only if the overall F-statistic
is significant. If the overall F-statistic is
significant, then the function makes a best choice as
to which t-statistics contributed to this result. The
methodology is based on the principle that any
t-statistic should be called significant if the F-test
is still significant for that row when all the larger
t-statistics are set to the same absolute size as the
t-statistic in question."

I think I understand that. But I am surprised that it
makes such a big difference. The difference is even
bigger with my real data set with 6 time points,
compared to the data from bioconductor that I am using
here. Also, which of the two approaches is more
trustworthy in the opinion of the bioconductor-list
users? With or without nestedF?

Thank you
Helen

# All commands
library(bronchialIL13)
data(HAHrma)
# Just for the IL13 samples
data <- HAHrma[,7:15]
# Design
targets	<-
c("h12","h12","h12","h24","h24","h24","h4","h4","h4")
lev	<- c("h12","h24","h4")
f	<- factor(targets, levels=lev)
design	<- model.matrix(~0+f)
colnames(design)	<- lev
fit <- lmFit(data, design)
# 3-step contrasts, used to directly get all
comparisons
contrasts <- makeContrasts("h24-h12", "h12-h4",
"h24-h4", levels=design)
fit2 <- contrasts.fit(fit, contrasts)
fit2 <- eBayes(fit2)
# results
results	<- decideTests(fit2, adjust="BH",
p.value=0.01)
resultsF	<- decideTests(fit2, method="nestedF",
adjust="BH", p.value=0.01)
summary(results)
#   h24-h12 h12-h4 h24-h4
#-1      87      2    237
#0    21714  22280  21266
#1      482      1    780
summary(resultsF)
#   h24-h12 h12-h4 h24-h4
#-1      94     41    159
#0    21706  22210  21547
#1      483     32    577

--- "James W. MacDonald" <jmacdon at med.umich.edu>
wrote:

> Hi Helen,
> 
> Helen Zhou wrote:
> > Dear Jim
> > 
> > Thanks you for your answer. As I understand you
> are
> > recommending me to do direct comparisons between
> for
> > example 12-4h and 24-12h. However, could there not
> in
> > theory be a gene that was up a little bit in
> 12vs4h
> > and 24vs12h, so that the difference for neither of
> > these would be large enough to be significant, but
> for
> > 24vs4h the combined change might be significant?
> In
> > that case I guess I would need all 3 comparisons?
> 
> Yes. The 'standard' linear modeling strategy is to
> fit the model and see 
> if you have a significant F-statistic, then come
> back and do contrasts 
> to see which particular comparison(s) is/are
> contributing to that 
> significance. The general idea being that you want
> to limit the number 
> of comparisons you are making so multiplicity is
> less of a problem.
> 
> However, with microarray data this isn't usually
> what people do. 
> Instead, they usually try to minimize the
> multiplicity by filtering out 
> uninteresting reporters prior to making comparisons
> (based on variance, 
> P/M/A calls, IQR, etc) and then computing all
> contrasts that they are 
> interested in with the reporters that remain.
> 
> You might consider using decideTests() using the
> 'nestedF' option - this 
> is intended to correctly detect differential
> expression when two or more 
> contrasts are significant, so in the case you
> mention may classify all 
> three contrasts as significant.
> 
> Best,
> 
> Jim
> 
> 
> > 
> > Thanks you
> > Helen
> > 
> > --- "James W. MacDonald" <jmacdon at med.umich.edu>
> > wrote:
> > 
> >> Hi Helen,
> >>
> >> Helen Zhou wrote:
> >>> Dear Sir/Madam
> >>>
> >>> I am trying to analyse a short time series,
> >> roughly
> >>> following section 8.8 in the limma user guide. I
> >> am
> >>> interested in differences between all time
> points.
> >> I
> >>> am not sure whether I have to make all the
> >> pariwise
> >>> comparisons directly, or whether they can be
> done
> >>> indirectly as well.
> >>>
> >>> For example, if I want to compare to time
> points,
> >> what
> >>> is the differences between the two methods
> listed
> >>> below.
> >>>
> >>> library(bronchialIL13)
> >>> # Just for the IL13 samples
> >>> data <- HAHrma[,7:15]
> >>> # Design
> >>> targets	<-
> >>>
> >
>
c("h12","h12","h12","h24","h24","h24","h4","h4","h4")
> >>> lev	<- c("h12","h24","h4")
> >>> f	<- factor(targets, levels=lev)
> >>> design	<- model.matrix(~0+f)
> >>> colnames(design)	<- lev
> >>> fit <- lmFit(data, design)
> >>> # 2-step contrasts, used to indirectly get 24 to
> 4
> >>> hours as well as the other two comparisons
> >>> contrasts <- makeContrasts("h24-h12", "h12-h4",
> >>> levels=design)
> >>> fit2 <- contrasts.fit(fit, contrasts)
> >>> fit2 <- eBayes(fit2)
> >>> # Direct contrast of 24 to 4 hours
> >>> contrasts2 <- makeContrasts("h24-h4",
> >> levels=design)
> >>> fit3 <- contrasts.fit(fit, contrasts2)
> >>> fit3 <- eBayes(fit3)
> >>> # Comparison
> >>> topTable(fit2, coef=1:2)
> >>> topTable(fit3, coef=1)
> >> In the first case you are asking the question
> 'which
> >> reporters are 
> >> different in either h24 vs h4 _or_ h12 vs h4',
> >> whereas in the second 
> >> case you are asking 'which reporters are
> different
> >> between H24 and h4'.
> >>
> >> It is entirely possible that you could have a
> gene
> >> that isn't different 
> >> between h24 and h4, but is different at h12. This
> >> would show up in the 
> >> first comparison but not the second, so if you
> want
> >> to compare time 
> >> points you are better off making direct contrasts
> >> rather than using the 
> >> F-statistic for multiple contrasts (which will
> then
> >> require the 
> >> additional step of figuring out which contrast(s)
> >> caused the statistic 
> >> to be significant).
> >>
> >> Best,
> >>
> >> Jim
> >>
> >>
> >>> More or less the same probe sets are present,
> but
> >> in
> >>> different order and with different p values. Is
> >> the
> >>> difference because using coef=1:2 will go via an
> >>> F-test? And if I want the change from 24h-0h as
> >> well
> >>> as 42-12h and 12-4h, is it most correct for me
> to
> >>> specify that contrast directly? In my actual
> >>> experiment I have 4 time points, so will it be
> >> enough
> >>> for me with 3 possible comparisons, or will I
> have
> >> to
> >>> write all the 6 possible combinations?
> >>>
> >>> Thank you in advance for all your help.
> >>>
> >>> Yours truly
> >>> Mrs Helen Zhou
> >>>
> >>> P.S. I think this might have been mentioned on
> the
> >>> list before, but I could not find the email. In
> >> that
> >>> case, please excuse me for repeating this.
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at stat.math.ethz.ch
> >>>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> >
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> -- 
> >> James W. MacDonald, M.S.
> >> Biostatistician
> >> Affymetrix and cDNA Microarray Core
> >> University of Michigan Cancer Center
> >> 1500 E. Medical Center Drive
> >> 7410 CCGC
> >> Ann Arbor MI 48109
> >> 734-647-5623
> >>
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?

> protection around 
> > http://mail.yahoo.com 
> > 
> 
> 
=== message truncated ===