[BioC] When to treat technical reps as biological reps? WAS:Re: 2x2 factorial loop without common reference (pool)
Jenny Drnevich
drnevich at uiuc.edu
Wed Apr 26 18:00:45 CEST 2006
Hi everyone,
Comments from Naomi and Gordon (below) about the technical replication in
the 2x2 factorial loop experiment are very close to an issue I have been
struggling with for several analyses: When (if ever) is it OK to treat
technical replicates as biological replicates? Often this is done when
there is more than one random effect (e.g. also have duplicate spots,
blocking effects, etc.) because as Gordon has said previously, the between
gene smoothing of limma cannot currently be done with more than one random
effect. I know there have been many discussions on this on the list
previously, but I can see two problems with treating tech reps as
biological reps, and only one of them has been addressed:
1. There is likely to be artificially decreased variance within treatment
groups because tech reps should have higher correlations than biological
reps. This problem has been addressed several times and probably the best
answer has come from Gordon along the lines of: often measurement error is
larger than biological variation, so IF there are not higher correlations
among tech reps then variance estimates should not be artificially decreased.
2. The DF is artificially increased due to psuedoreplication of the
biological replicates, which leads to artificially lower p-values. This
combined with even minor changes to the variance components can lead to
large changes in p-values in my experience.
As far as I know, this second problem has not been addressed. As a case in
point, in the 2x2 factorial loop from before, each of the three biological
replicates has 4 technical replicates, and even if there are not higher
correlations, treating them as biological reps yields N=12 for each group
instead of N=3. Shouldn't we be worried about this effect as well? In such
cases when the experiment design really has more than one random effect,
wouldn't the analysis be better off to model the random effects properly
with a multilevel model such as lme/nlme rather than get the benefits of
the empirical Bayes shrinkage either through ignoring technical replication
or averaging dye swaps?
Thanks,
Jenny
Naomi's comment:
I would use single channel analysis for
this. The only problem is that Limma allows only
1 level of random effects. Hence, you will need to average the dye-swaps.
Gordon's comment:
>PS. Although you don't say explicitly, I'm assuming that a1, a2 etc
>represent some sort of biological replication. The above analysis
>does not keep track of which array has which biological replicate of
>each treatment. If you wanted to do a careful job of that, you would
>have no choice but to do a "separate channel" analysis, as Naomi
>Altman has suggested separately. If your biological replicates a1, a2
>etc are not very different, compared to microarray measurement error,
>then the above simpler analysis may be good enough.
>
>Date: Sun, 23 Apr 2006 13:41:22 -0400
> >From: "francois fauteux" <francois.fauteux at gmail.com>
> >Subject: [BioC] 2x2 factorial loop without common reference (pool)
> >To: bioconductor at stat.math.ethz.ch, " Fran?ois fauteux "
> > <francois.fauteux at gmail.com>, " Richard B?langer "
> > <richard.belanger at plg.ulaval.ca>
> >Message-ID:
> > <53328b400604231041v51db3863i8bb48b2fbf725229 at mail.gmail.com>
> >Content-Type: text/plain; charset=ISO-8859-1
> >
> >Hi;
> >
> >We are doing an experiment with agilent 44K (3 biological reps,
> >complete dye-swap):
> >
> >a - control
> >b - treatment 1
> >c - treatment 2
> >d - treatment 1 + treatment 2
> >
> >and I would like to output evidence of the interaction between two
> >treatments and effect on gene expression.
> >
> >24 chips:
> >
> >SlideNumber Cy3 Cy5
> >1 a1 b1
> >2 a2 b2
> >3 a3 b3
> >4 b1 a1
> >5 b2 a2
> >6 b3 a3
> >7 a1 c1
> >8 a2 c2
> >9 a3 c3
> >10 c1 a1
> >11 c2 a2
> >12 c3 a3
> >13 b1 d1
> >14 b2 d2
> >15 b3 d3
> >16 d1 b1
> >17 d2 b2
> >18 d3 b3
> >19 c1 d1
> >20 c2 d2
> >21 c3 d3
> >22 d1 c1
> >23 d2 c2
> >24 d3 c3
> >
> >I've done several tests with limma to isolate significant results in
> >the following:
> >1- a vs b;
> >2- a vs c;
> >3- b bs d;
> >4- c vs d;
> >
> >with this "targets.txt":
> >
> >SlideNumber Cy3 Cy5
> >1 a b
> >2 a b
> >3 a b
> >4 b a
> >5 b a
> >6 b a
> >7 a c
> >8 a c
> >9 a c
> >10 c a
> >11 c a
> >12 c a
> >13 b d
> >14 b d
> >15 b d
> >16 d b
> >17 d b
> >18 d b
> >19 c d
> >20 c d
> >21 c d
> >22 d c
> >23 d c
> >24 d c
> >
> >First option:
> >
> > > f <- paste(targets$Cy3, targets$Cy5, sep = ".")
> > > f <- factor(f, levels = c("a.b", "b.a", "a.c", "c.a", "b.d",
> > "d.a", "c.d", "d.a"))
> > > design1 <- model.matrix(~0 + f)
> >
> > > design
> > a.b b.a a.c c.a b.d d.b c.d d.c
> >1 1 0 0 0 0 0 0 0
> >2 1 0 0 0 0 0 0 0
> >3 1 0 0 0 0 0 0 0
> >4 0 1 0 0 0 0 0 0
> >5 0 1 0 0 0 0 0 0
> >6 0 1 0 0 0 0 0 0
> >7 0 0 1 0 0 0 0 0
> >8 0 0 1 0 0 0 0 0
> >9 0 0 1 0 0 0 0 0
> >10 0 0 0 1 0 0 0 0
> >11 0 0 0 1 0 0 0 0
> >12 0 0 0 1 0 0 0 0
> >13 0 0 0 0 1 0 0 0
> >14 0 0 0 0 1 0 0 0
> >15 0 0 0 0 1 0 0 0
> >16 0 0 0 0 0 1 0 0
> >17 0 0 0 0 0 1 0 0
> >18 0 0 0 0 0 1 0 0
> >19 0 0 0 0 0 0 1 0
> >20 0 0 0 0 0 0 1 0
> >21 0 0 0 0 0 0 1 0
> >22 0 0 0 0 0 0 0 1
> >23 0 0 0 0 0 0 0 1
> >24 0 0 0 0 0 0 0 1
> >
> >This gives significant results for each one of the "levels" but does
> >not take into account the dye-swap (i.e "a.b" and "b.a" are considered
> >independent).
> >
> >Other tested option is:
> > > design2 <- modelMatrix(targets,ref="a")
> >
> > > design
> > p s sp
> >ab1 0 1 0
> >ab2 0 1 0
> >ab3 0 1 0
> >ba1 0 -1 0
> >ba2 0 -1 0
> >ba3 0 -1 0
> >ac1 1 0 0
> >ac2 1 0 0
> >ac3 1 0 0
> >ca1 -1 0 0
> >ca2 -1 0 0
> >ca3 -1 0 0
> >bd1 0 -1 1
> >bd2 0 -1 1
> >bd3 0 -1 1
> >db1 0 1 -1
> >db2 0 1 -1
> >db3 0 1 -1
> >cd1 -1 0 1
> >cd2 -1 0 1
> >cd3 -1 0 1
> >dc1 1 0 -1
> >dc2 1 0 -1
> >dc3 1 0 -1
> >
> >This gives results for "b" effect, "c" effect, and "d" effect.
> >However, I could'nt get results for the 4 comparisons of interest
> >(even though the matrix is coherent).
> >
> >Questions:
> >
> >1 - What would be the best option (design and operations) to get to
> >contrasts of interest considering that the experiment has a 4
> >treatments in a factorial design without common reference (a vs b, a
> >vs c, b vs d, c vs d) and taking into account the dye-effect;
> >
> >2- Is this method (4 contrasts) the best one considering that
> >treatment "d" is a combination of treatments "b" and "c" (factorial
> >type design). How could one directly get to identify genes
> >differentially expressed due to the interaction between treatment "b"
> >and treatment "c" (i.e effect of "d" over "b" and "c").
> >
> >In Limma Users Guide and elsewhere on this forum, I could not find a
> >clear description of how this type of analysis should be performed,
> >even though it is a simple design (i.e 2X2 factorial without a common
> >reference - two color arrays - complete dye swap).
> >
> >Thanks for your time, best regards.
> >
> >Fran?ois Fauteux
> >?tudiant ? la ma?trise en biologie v?g?tale
> >Centre de recherche en horticulture
> >Universit? Laval
> >francois.fauteux at gmail.com
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list