[BioC] Nested design in limma

Wed Apr 18 15:36:03 CEST 2007

Dear Caroline,
The key to the response about averaging is that in a purely nested 
completely balanced design like this, with random effects at each 
level but the highest, the analysis of each factor of the design 
depends only on the averages within the levels below.  So, hypotheses 
about the differences between groups can answered using the genewise 
averages of all the observations for each patient.

The levels of subsampling can be used to determine the main sources 
of variation in the study, which is useful for planning further 
studies, but not for testing differences between groups.  If you need 
to understand the sources of variance in your study, you could handle 
this in limma by analyzing each group separately, level by 
level.  Alternatively, you could use SAS  to estimate the variance 
components for each level of replication.  I think that MAANOVA in 
Bioconductor may also do this analysis, but I have not used it.

--Naomi

At 10:47 PM 4/17/2007, Kasper Daniel Hansen wrote:

>On Apr 17, 2007, at 2:23 AM, <caroline.truntzer at chu-lyon.fr>
><caroline.truntzer at chu-lyon.fr> wrote:
>
> > Dear list,
> > My question is a follow-up of the thread about handling nested
> > design using
> > limma posted by Tao Shi (please see
> > https://stat.ethz.ch/pipermail/bioconductor/2007-January/015717.html).
> > I have a data set which has a similar design as Tao Shi: 14
> > patients (7 in
> > one group, 7 in another group), 2 biological samples for each patients
> > (corresponding to 2 different extractions), and each extraction is
> > hybridized to 2 arrays and I have triplicate sets of probes. I
> > would like
> > to identify genes that have differential expression between the 2
> > groups.
> > I read the responses written to Tao on how to analyse this data
> > set, but
> > there are some things I didn't understand.
> > The advice was to use avedups() to average over the triplicate
> > probes, and
> > then to treat the patients as biological replicates (as blocks using
> > duplicateCorrelation). But by doing so I do not understand how the two
> > other replication levels are treated, that is extraction and
> > hybridization.
> > Is it possible to keep the information of this two replication
> > levels in
> > the analysis? Is it possible to set different levels in blocks
> > (given the
> > help for the duplicateCorrelation fonction I think it is not
> > possible but
> > perhaps someone found a mean to do that)?
> > Moreover I think I'm confused with what should be put in the design
> > matrix
> > and what should rather be put in the blocks vector. I'm sorry for this
> > naive question...
>
> > Thanks in advance for your help
> > Caroline
>
>This will be a quick answer. You are right that you have many levels
>of dependency in your design: 3 probes measuring the same transcript,
>2 samples per patient and 2 hybridizations per sample. That should
>(from a certain perspective) be analyzed using a model with several
>random effects (ie. several levels of dupCor). Unfortunately limma
>cannot handle more than one level, so in that case you need to focus
>on what dependency you think is most important to model. The
>recommendations in the thread you are referring to (which I only
>skimmed _very_ quickly) essentially deals with this question.
>
>Kasper
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111