[BioC] Random effects and variance components

Mon Jun 1 18:21:32 CEST 2009

Have a look at MAANOVA. --Naomi

At 09:49 AM 6/1/2009, James W. MacDonald wrote:
>Hi Paolo,
>
>I don't think you can fit the model you describe using limma, and I 
>really don't think you can get the variance components. If you want 
>to fit more sophisticated mixed models, you will likely need to use 
>the lme4 package, and the lmer() function in particular.
>
>But note that this route will require much more work on your part, 
>both in understanding how lme4 and lmer() work, as well as writing 
>the code to fit individual models to each reporter and extracting the results.
>
>Best,
>
>Jim
>
>
>
>Paolo Innocenti wrote:
>>Dear Gordon and list,
>>thanks for the previous help, it was indeed helpful. Nonetheless, 
>>even after some strolling (and independent trials-and-errors), I am 
>>still stuck on this issue:
>>we came to the conclusion that the simplest good model for our affy 
>>experiment is the following:
>>design <- model.matrix(~ sex*line, data=pData(data))
>>sex: M/F,
>>line: 15 levels (different clones)
>>8 biological replicates for each line (4 for each sex)
>>The issue here is that the "line" factor should be treated as 
>>random. First, because each line is a haplotype randomly picked 
>>from a large outbred population. Second, because we would like to 
>>estimate, from the variance components, the heritability of the 
>>transcript (the variance explained by the "line" term can be 
>>approximated to the genetic variance).
>>Gordon Smyth wrote:
>>"Finally, you can add the biolrep as a random effect using the
>>duplicateCorrelation() function with block argument, as explained in the
>>limma User's Guide, but I am not convinced yet that this is absolutely
>>necessary for your experiment."
>>I am not sure I understand what you mean here, but the random 
>>effect I want to include is "line", not the biological replicate 
>>(that is numbered 1:4 in each line/sex, but it says nothing about 
>>relationships between each 1, each 2, etc...)
>>So, the two questions are:
>>1) How to treat "line" as random. I appreciate that this issue is 
>>explained in chapter 8.2 of the limma user's guide, but I still 
>>don't get how to fit the interaction and how to include my random 
>>effect in my contrast, or in my toptable. For instance:
>>design <- model.matrix(~ sex*line, data=pData(data))
>>randomline <- duplicateCorrelation(eset, design,
>>     block=rep(1:15,each=8))
>>fit <- lmFit(eset, design, block=rep(1:15, each=8),
>>         cor=randomline$consensus)
>>I get this error:
>>Error in chol.default(V) :
>>   the leading minor of order 2 is not positive definite
>>Probably because the block is exactly the same vector as the line, 
>>already included in the design.
>>On the other hand, if I fit:
>>design <- model.matrix(~ sex, data=pData(data))
>>randomline <- duplicateCorrelation(eset, design,
>>     block=rep(1:15,each=8))
>>fit <- lmFit(eset, design, block=rep(1:15, each=8),
>>         cor=randomline$consensus)
>>There is no line:sex interaction, and the only effect I can obtain 
>>is the effect of sex.
>>How can I get the effect of the line, and of the sex:line interaction?
>>2) Where can I find the variance components of my random effect?
>>Thanks in advance for any help,
>>paolo
>>
>>>>sessionInfo()
>>>R version 2.9.0 (2009-04-17) x86_64-unknown-linux-gnu
>>>locale:
>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C 
>>>
>>>
>>>attached base packages:
>>>[1] tcltk     stats     graphics  grDevices utils     datasets
>>>methods  [8] base
>>>other attached packages:
>>>[1] statmod_1.4.0  qvalue_1.18.0  limma_2.18.1   affy_1.22.0
>>>Biobase_2.4.1 [6] maanova_1.14.0
>>>
>>>loaded via a namespace (and not attached):
>>>[1] affyio_1.12.0        preprocessCore_1.6.0 tools_2.9.0
>>Gordon K Smyth wrote:
>>>Dear Paolo,
>>>
>>>As Naomi Altman as already told you, analysing an experiment such 
>>>as this is straightforward with limma.  I guess the problem you 
>>>are having is that you are trying to use the limma User's Guide's 
>>>suggestion of forming a composite factor out of the individual 
>>>factors (called the group means parametrization), and you don't 
>>>know how to define contrasts for interactions from this 
>>>factor.  This does become a little more involved for experiments 
>>>with more factors.  Can I suggest that you instead make use of the 
>>>factorial formulae in R when you make up the design matrix, then 
>>>you can probably dispense with the contrast step altogether.
>>>
>>>You could for example use
>>>
>>>   targets <- read.delim("targets.txt")
>>>   design <- model.matrix(~Batch+Sex*(Phen/Line), data=targets)
>>>
>>>This will produce a design matrix with the following columns.
>>>
>>>   > colnames(design)
>>>  [1] "(Intercept)"     "Batch"           "SexM"
>>>  [4] "PhenH"           "PhenL"           "PhenA:Line"
>>>  [7] "PhenH:Line"      "PhenL:Line"      "SexM:PhenH"
>>>[10] "SexM:PhenL"      "SexM:PhenA:Line" "SexM:PhenH:Line"
>>>[13] "SexM:PhenL:Line"
>>>
>>>To find genes significant for the sex x line interaction, you can simply use
>>>
>>>   fit <- lmFit(eset, design)
>>>   fit <- eBayes(fit)
>>>   topTable(fit, coef=9:13)
>>>
>>>On the other hand,
>>>
>>>   topTable(fit, coef=9:10)
>>>
>>>is the sex x phen interaction.
>>>
>>>Finally, you can add the biolrep as a random effect using the 
>>>duplicateCorrelation() function with block argument, as explained 
>>>in the limma User's Guide, but I am not convinced yet that this is 
>>>absolutely necessary for your experiment.
>>>
>>>Can I also suggest that you stroll over to the mathematics 
>>>department at Uppsala and talk to someone interested in 
>>>bioinformatics and microarray analysis, say Professor Tom Britton, 
>>>and see if you can get ongoing help with statistics and design issues.
>>>
>>>Best wishes
>>>Gordon
>>>
>>>
>>>>Date: Mon, 04 May 2009 14:09:14 +0200
>>>>From: Paolo Innocenti <paolo.innocenti at ebc.uu.se>
>>>>Subject: Re: [BioC] Yet another nested design in limma
>>>>Cc: AAA - Bioconductor <bioconductor at stat.math.ethz.ch>
>>>>
>>>>Hi all,
>>>>
>>>>since I received a few emails in my mailbox by people interested in a
>>>>solution for this design (or a design similar to this one), but there is
>>>>apparently no (easy) solution in limma, I was wondering if anyone could
>>>>suggest a package for differential expression analysis that allows
>>>>dealing with:
>>>>
>>>>nested designs,
>>>>random effects,
>>>>multiple factorial designs with more than 2 levels.
>>>>
>>>>I identified siggenes, maanova, factDesign that could fit my needs, but
>>>>I would like to have a comment by someone with more experience before
>>>>diving into a new package.
>>>>
>>>>Best,
>>>>paolo
>>>>
>>>>
>>>>
>>>>Paolo Innocenti wrote:
>>>>>Hi Naomi and list,
>>>>>
>>>>>some time ago I asked a question on how to model an experiment in limma.
>>>>>I think I need some additional help with it as the experiment grew in
>>>>>complexity. I also added a factor "batch" because the arrays were run in
>>>>>separate batches, and I think would be good to control for it.
>>>>>The dataframe with phenotypic informations ("dummy") looks like this:
>>>>>
>>>>> >>         Phen    Line    Sex     Batch     BiolRep
>>>>> >> File1   H       1       M       1         1
>>>>> >> File2   H       1       M       1         2
>>>>> >> File3   H       1       M       2         3
>>>>> >> File4   H       1       M       2         4
>>>>> >> File5   H       1       F       1         1
>>>>> >> File6   H       1       F       1         2
>>>>> >> File7   H       1       F       2         3
>>>>> >> File8   H       1       F       2         4
>>>>> >> File9   H       2       M       1         1
>>>>> >> File10  H       2       M       1         2
>>>>> >> File11  H       2       M       2         3
>>>>> >> File12  H       2       M       2         4
>>>>> >> File13  H       2       F       1         1
>>>>> >> File14  H       2       F       1         2
>>>>> >> File15  H       2       F       2         3
>>>>> >> File16  H       2       F       2         4
>>>>> >> File17  L       3       M       1         1
>>>>> >> File18  L       3       M       1         2
>>>>> >> File19  L       3       M       2         3
>>>>> >> File20  L       3       M       2         4
>>>>> >> File21  L       3       F       1         1
>>>>> >> File22  L       3       F       1         2
>>>>> >> File23  L       3       F       2         3
>>>>> >> File24  L       3       F       2         4
>>>>> >> File25  L       4       M       1         1
>>>>> >> File26  L       4       M       1         2
>>>>> >> File27  L       4       M       2         3
>>>>> >> File28  L       4       M       2         4
>>>>> >> File29  L       4       F       1         1
>>>>> >> File30  L       4       F       1         2
>>>>> >> File31  L       4       F       2         3
>>>>> >> File32  L       4       F       2         4
>>>>> >> File33  A       5       M       1         1
>>>>> >> File34  A       5       M       1         2
>>>>> >> File35  A       5       M       2         3
>>>>> >> File36  A       5       M       2         4
>>>>> >> File37  A       5       F       1         1
>>>>> >> File38  A       5       F       1         2
>>>>> >> File39  A       5       F       2         3
>>>>> >> File40  A       5       F       2         4
>>>>> >> File41  A       6       M       1         1
>>>>> >> File42  A       6       M       1         2
>>>>> >> File43  A       6       M       2         3
>>>>> >> File44  A       6       M       2         4
>>>>> >> File45  A       6       F       1         1
>>>>> >> File46  A       6       F       1         2
>>>>> >> File47  A       6       F       2         3
>>>>> >> File48  A       6       F       2         4
>>>>>
>>>>>In total I have
>>>>>Factor "Phen", with 3 levels
>>>>>Factor "Line", nested in Phen, 6 levels
>>>>>Factor "Sex", 2 levels
>>>>>Factor "Batch", 2 levels
>>>>>
>>>>>I am interested in:
>>>>>
>>>>>1) Effect of sex (M vs F)
>>>>>2) Interaction between "Sex" and "Line" (or "Sex" and "Phen")
>>>>>
>>>>>Now, I can't really come up with a design matrix (not to mention the
>>>>>contrast matrix).
>>>>>
>>>>>Naomi Altman wrote:
>>>>>>You can design this in limma quite readily.  Nesting really just means
>>>>>>that only a subset of the possible contrasts are of interest.  Just
>>>>>>create the appropriate contrast matrix and you are all set.
>>>>>
>>>>>I am not really sure with what you mean here. Should I treat all the
>>>>>factors as in a factorial design?
>>>>>I might do something like this:
>>>>>
>>>>>phen <- factor(dummy$Phen)
>>>>>line <- factor(dummy$Line)
>>>>>sex <- factor(dummy$Sex)
>>>>>batch <- factor(dummy$Batch)
>>>>>fact <- factor(paste(sex,phen,line,sep="."))
>>>>>design <- model.matrix(~ 0 + fact + batch)
>>>>>colnames(design) <- c(levels(fact), "batch2")
>>>>>fit <- lmFit(dummy.eset,design)
>>>>>contrast <- makeContrasts(
>>>>>         sex= (F.H.1 + F.H.2 + F.L.3 + F.L.4 + F.A.5 + F.A.6) - (M.H.1 +
>>>>>M.H.2 + M.L.3 + M.L.4 + M.A.5 + M.A.6),
>>>>>         levels=design)
>>>>>fit2 <- contrasts.fit(fit,contrast)
>>>>>fit2 <- eBayes(fit2)
>>>>>
>>>>>In this way I can correctly (I presume) obtain the effect of sex, but
>>>>>how can I get the interaction term between sex and line?
>>>>>I presume there is a "easy" way, but I can't see it...
>>>>>
>>>>>Thanks,
>>>>>paolo
>>>>>
>>>>>
>>>>>>
>>>>>>--Naomi
>>>>>>
>>>>>>At 12:08 PM 2/16/2009, Paolo Innocenti wrote:
>>>>>>>Hi all,
>>>>>>>
>>>>>>>I have an experimental design for a Affy experiment that looks like
>>>>>>>this:
>>>>>>>
>>>>>>>         Phen    Line    Sex     Biol.Rep.
>>>>>>>File1   H       1       M       1
>>>>>>>File2   H       1       M       2
>>>>>>>File3   H       1       F       1
>>>>>>>File4   H       1       F       2
>>>>>>>File5   H       2       M       1
>>>>>>>File6   H       2       M       2
>>>>>>>File7   H       2       F       1
>>>>>>>File8   H       2       F       2
>>>>>>>File9   L       3       M       1
>>>>>>>File10  L       3       M       2
>>>>>>>File11  L       3       F       1
>>>>>>>File12  L       3       F       2
>>>>>>>File13  L       4       M       1
>>>>>>>File14  L       4       M       2
>>>>>>>File15  L       4       F       1
>>>>>>>File16  L       4       F       2
>>>>>>>
>>>>>>>
>>>>>>>This appears to be a slightly more complicated situation than the one
>>>>>>>proposed in the section 8.7 of the limma users guide (p.45) or by
>>>>>>>Jenny on this post:
>>>>>>>
>>>>>>>https://stat.ethz.ch/pipermail/bioconductor/2006-February/011965.html
>>>>>>>
>>>>>>>In particular, I am intersted in
>>>>>>>- Effect of "sex" (M vs F)
>>>>>>>- Interaction between "sex" and "phenotype ("line" nested)
>>>>>>>- Effect of "phenotype" in males
>>>>>>>- Effect of "phenotype" in females
>>>>>>>
>>>>>>>Line should be nested in phenotype, because they are random "strains"
>>>>>>>that happened to end up in phenotype H or L.
>>>>>>>
>>>>>>>Can I design this in limma? Is there a source of information about
>>>>>>>how to handle with this? In particular, can I design a single model
>>>>>>>matrix and then choose the contrasts I am interested in?
>>>>>>>
>>>>>>>Any help is much appreciated,
>>>>>>>paolo
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Paolo Innocenti
>>>>>>>Department of Animal Ecology, EBC
>>>>>>>Uppsala University
>>>>>>>Norbyv?gen 18D
>>>>>>>75236 Uppsala, Sweden
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioconductor mailing list
>>>>>>>Bioconductor at stat.math.ethz.ch
>>>>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>Search the archives:
>>>>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>Naomi S. Altman                                814-865-3791 (voice)
>>>>>>Associate Professor
>>>>>>Dept. of Statistics                              814-863-7114 (fax)
>>>>>>Penn State University                         814-865-1348 (Statistics)
>>>>>>University Park, PA 16802-2111
>>>>>>
>>>>
>>>>--
>>>>Paolo Innocenti
>>>>Department of Animal Ecology, EBC
>>>>Uppsala University
>>>>Norbyv?gen 18D
>>>>75236 Uppsala, Sweden
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Douglas Lab
>University of Michigan
>Department of Human Genetics
>5912 Buhl
>1241 E. Catherine St.
>Ann Arbor MI 48109-5618
>734-615-7826
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111