[BioC] Random effects and variance components
Naomi Altman
naomi at stat.psu.edu
Mon Jun 1 18:21:32 CEST 2009
Have a look at MAANOVA. --Naomi
At 09:49 AM 6/1/2009, James W. MacDonald wrote:
>Hi Paolo,
>
>I don't think you can fit the model you describe using limma, and I
>really don't think you can get the variance components. If you want
>to fit more sophisticated mixed models, you will likely need to use
>the lme4 package, and the lmer() function in particular.
>
>But note that this route will require much more work on your part,
>both in understanding how lme4 and lmer() work, as well as writing
>the code to fit individual models to each reporter and extracting the results.
>
>Best,
>
>Jim
>
>
>
>Paolo Innocenti wrote:
>>Dear Gordon and list,
>>thanks for the previous help, it was indeed helpful. Nonetheless,
>>even after some strolling (and independent trials-and-errors), I am
>>still stuck on this issue:
>>we came to the conclusion that the simplest good model for our affy
>>experiment is the following:
>>design <- model.matrix(~ sex*line, data=pData(data))
>>sex: M/F,
>>line: 15 levels (different clones)
>>8 biological replicates for each line (4 for each sex)
>>The issue here is that the "line" factor should be treated as
>>random. First, because each line is a haplotype randomly picked
>>from a large outbred population. Second, because we would like to
>>estimate, from the variance components, the heritability of the
>>transcript (the variance explained by the "line" term can be
>>approximated to the genetic variance).
>>Gordon Smyth wrote:
>>"Finally, you can add the biolrep as a random effect using the
>>duplicateCorrelation() function with block argument, as explained in the
>>limma User's Guide, but I am not convinced yet that this is absolutely
>>necessary for your experiment."
>>I am not sure I understand what you mean here, but the random
>>effect I want to include is "line", not the biological replicate
>>(that is numbered 1:4 in each line/sex, but it says nothing about
>>relationships between each 1, each 2, etc...)
>>So, the two questions are:
>>1) How to treat "line" as random. I appreciate that this issue is
>>explained in chapter 8.2 of the limma user's guide, but I still
>>don't get how to fit the interaction and how to include my random
>>effect in my contrast, or in my toptable. For instance:
>>design <- model.matrix(~ sex*line, data=pData(data))
>>randomline <- duplicateCorrelation(eset, design,
>> block=rep(1:15,each=8))
>>fit <- lmFit(eset, design, block=rep(1:15, each=8),
>> cor=randomline$consensus)
>>I get this error:
>>Error in chol.default(V) :
>> the leading minor of order 2 is not positive definite
>>Probably because the block is exactly the same vector as the line,
>>already included in the design.
>>On the other hand, if I fit:
>>design <- model.matrix(~ sex, data=pData(data))
>>randomline <- duplicateCorrelation(eset, design,
>> block=rep(1:15,each=8))
>>fit <- lmFit(eset, design, block=rep(1:15, each=8),
>> cor=randomline$consensus)
>>There is no line:sex interaction, and the only effect I can obtain
>>is the effect of sex.
>>How can I get the effect of the line, and of the sex:line interaction?
>>2) Where can I find the variance components of my random effect?
>>Thanks in advance for any help,
>>paolo
>>
>>>>sessionInfo()
>>>R version 2.9.0 (2009-04-17) x86_64-unknown-linux-gnu
>>>locale:
>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>>
>>>
>>>attached base packages:
>>>[1] tcltk stats graphics grDevices utils datasets
>>>methods [8] base
>>>other attached packages:
>>>[1] statmod_1.4.0 qvalue_1.18.0 limma_2.18.1 affy_1.22.0
>>>Biobase_2.4.1 [6] maanova_1.14.0
>>>
>>>loaded via a namespace (and not attached):
>>>[1] affyio_1.12.0 preprocessCore_1.6.0 tools_2.9.0
>>Gordon K Smyth wrote:
>>>Dear Paolo,
>>>
>>>As Naomi Altman as already told you, analysing an experiment such
>>>as this is straightforward with limma. I guess the problem you
>>>are having is that you are trying to use the limma User's Guide's
>>>suggestion of forming a composite factor out of the individual
>>>factors (called the group means parametrization), and you don't
>>>know how to define contrasts for interactions from this
>>>factor. This does become a little more involved for experiments
>>>with more factors. Can I suggest that you instead make use of the
>>>factorial formulae in R when you make up the design matrix, then
>>>you can probably dispense with the contrast step altogether.
>>>
>>>You could for example use
>>>
>>> targets <- read.delim("targets.txt")
>>> design <- model.matrix(~Batch+Sex*(Phen/Line), data=targets)
>>>
>>>This will produce a design matrix with the following columns.
>>>
>>> > colnames(design)
>>> [1] "(Intercept)" "Batch" "SexM"
>>> [4] "PhenH" "PhenL" "PhenA:Line"
>>> [7] "PhenH:Line" "PhenL:Line" "SexM:PhenH"
>>>[10] "SexM:PhenL" "SexM:PhenA:Line" "SexM:PhenH:Line"
>>>[13] "SexM:PhenL:Line"
>>>
>>>To find genes significant for the sex x line interaction, you can simply use
>>>
>>> fit <- lmFit(eset, design)
>>> fit <- eBayes(fit)
>>> topTable(fit, coef=9:13)
>>>
>>>On the other hand,
>>>
>>> topTable(fit, coef=9:10)
>>>
>>>is the sex x phen interaction.
>>>
>>>Finally, you can add the biolrep as a random effect using the
>>>duplicateCorrelation() function with block argument, as explained
>>>in the limma User's Guide, but I am not convinced yet that this is
>>>absolutely necessary for your experiment.
>>>
>>>Can I also suggest that you stroll over to the mathematics
>>>department at Uppsala and talk to someone interested in
>>>bioinformatics and microarray analysis, say Professor Tom Britton,
>>>and see if you can get ongoing help with statistics and design issues.
>>>
>>>Best wishes
>>>Gordon
>>>
>>>
>>>>Date: Mon, 04 May 2009 14:09:14 +0200
>>>>From: Paolo Innocenti <paolo.innocenti at ebc.uu.se>
>>>>Subject: Re: [BioC] Yet another nested design in limma
>>>>Cc: AAA - Bioconductor <bioconductor at stat.math.ethz.ch>
>>>>
>>>>Hi all,
>>>>
>>>>since I received a few emails in my mailbox by people interested in a
>>>>solution for this design (or a design similar to this one), but there is
>>>>apparently no (easy) solution in limma, I was wondering if anyone could
>>>>suggest a package for differential expression analysis that allows
>>>>dealing with:
>>>>
>>>>nested designs,
>>>>random effects,
>>>>multiple factorial designs with more than 2 levels.
>>>>
>>>>I identified siggenes, maanova, factDesign that could fit my needs, but
>>>>I would like to have a comment by someone with more experience before
>>>>diving into a new package.
>>>>
>>>>Best,
>>>>paolo
>>>>
>>>>
>>>>
>>>>Paolo Innocenti wrote:
>>>>>Hi Naomi and list,
>>>>>
>>>>>some time ago I asked a question on how to model an experiment in limma.
>>>>>I think I need some additional help with it as the experiment grew in
>>>>>complexity. I also added a factor "batch" because the arrays were run in
>>>>>separate batches, and I think would be good to control for it.
>>>>>The dataframe with phenotypic informations ("dummy") looks like this:
>>>>>
>>>>> >> Phen Line Sex Batch BiolRep
>>>>> >> File1 H 1 M 1 1
>>>>> >> File2 H 1 M 1 2
>>>>> >> File3 H 1 M 2 3
>>>>> >> File4 H 1 M 2 4
>>>>> >> File5 H 1 F 1 1
>>>>> >> File6 H 1 F 1 2
>>>>> >> File7 H 1 F 2 3
>>>>> >> File8 H 1 F 2 4
>>>>> >> File9 H 2 M 1 1
>>>>> >> File10 H 2 M 1 2
>>>>> >> File11 H 2 M 2 3
>>>>> >> File12 H 2 M 2 4
>>>>> >> File13 H 2 F 1 1
>>>>> >> File14 H 2 F 1 2
>>>>> >> File15 H 2 F 2 3
>>>>> >> File16 H 2 F 2 4
>>>>> >> File17 L 3 M 1 1
>>>>> >> File18 L 3 M 1 2
>>>>> >> File19 L 3 M 2 3
>>>>> >> File20 L 3 M 2 4
>>>>> >> File21 L 3 F 1 1
>>>>> >> File22 L 3 F 1 2
>>>>> >> File23 L 3 F 2 3
>>>>> >> File24 L 3 F 2 4
>>>>> >> File25 L 4 M 1 1
>>>>> >> File26 L 4 M 1 2
>>>>> >> File27 L 4 M 2 3
>>>>> >> File28 L 4 M 2 4
>>>>> >> File29 L 4 F 1 1
>>>>> >> File30 L 4 F 1 2
>>>>> >> File31 L 4 F 2 3
>>>>> >> File32 L 4 F 2 4
>>>>> >> File33 A 5 M 1 1
>>>>> >> File34 A 5 M 1 2
>>>>> >> File35 A 5 M 2 3
>>>>> >> File36 A 5 M 2 4
>>>>> >> File37 A 5 F 1 1
>>>>> >> File38 A 5 F 1 2
>>>>> >> File39 A 5 F 2 3
>>>>> >> File40 A 5 F 2 4
>>>>> >> File41 A 6 M 1 1
>>>>> >> File42 A 6 M 1 2
>>>>> >> File43 A 6 M 2 3
>>>>> >> File44 A 6 M 2 4
>>>>> >> File45 A 6 F 1 1
>>>>> >> File46 A 6 F 1 2
>>>>> >> File47 A 6 F 2 3
>>>>> >> File48 A 6 F 2 4
>>>>>
>>>>>In total I have
>>>>>Factor "Phen", with 3 levels
>>>>>Factor "Line", nested in Phen, 6 levels
>>>>>Factor "Sex", 2 levels
>>>>>Factor "Batch", 2 levels
>>>>>
>>>>>I am interested in:
>>>>>
>>>>>1) Effect of sex (M vs F)
>>>>>2) Interaction between "Sex" and "Line" (or "Sex" and "Phen")
>>>>>
>>>>>Now, I can't really come up with a design matrix (not to mention the
>>>>>contrast matrix).
>>>>>
>>>>>Naomi Altman wrote:
>>>>>>You can design this in limma quite readily. Nesting really just means
>>>>>>that only a subset of the possible contrasts are of interest. Just
>>>>>>create the appropriate contrast matrix and you are all set.
>>>>>
>>>>>I am not really sure with what you mean here. Should I treat all the
>>>>>factors as in a factorial design?
>>>>>I might do something like this:
>>>>>
>>>>>phen <- factor(dummy$Phen)
>>>>>line <- factor(dummy$Line)
>>>>>sex <- factor(dummy$Sex)
>>>>>batch <- factor(dummy$Batch)
>>>>>fact <- factor(paste(sex,phen,line,sep="."))
>>>>>design <- model.matrix(~ 0 + fact + batch)
>>>>>colnames(design) <- c(levels(fact), "batch2")
>>>>>fit <- lmFit(dummy.eset,design)
>>>>>contrast <- makeContrasts(
>>>>> sex= (F.H.1 + F.H.2 + F.L.3 + F.L.4 + F.A.5 + F.A.6) - (M.H.1 +
>>>>>M.H.2 + M.L.3 + M.L.4 + M.A.5 + M.A.6),
>>>>> levels=design)
>>>>>fit2 <- contrasts.fit(fit,contrast)
>>>>>fit2 <- eBayes(fit2)
>>>>>
>>>>>In this way I can correctly (I presume) obtain the effect of sex, but
>>>>>how can I get the interaction term between sex and line?
>>>>>I presume there is a "easy" way, but I can't see it...
>>>>>
>>>>>Thanks,
>>>>>paolo
>>>>>
>>>>>
>>>>>>
>>>>>>--Naomi
>>>>>>
>>>>>>At 12:08 PM 2/16/2009, Paolo Innocenti wrote:
>>>>>>>Hi all,
>>>>>>>
>>>>>>>I have an experimental design for a Affy experiment that looks like
>>>>>>>this:
>>>>>>>
>>>>>>> Phen Line Sex Biol.Rep.
>>>>>>>File1 H 1 M 1
>>>>>>>File2 H 1 M 2
>>>>>>>File3 H 1 F 1
>>>>>>>File4 H 1 F 2
>>>>>>>File5 H 2 M 1
>>>>>>>File6 H 2 M 2
>>>>>>>File7 H 2 F 1
>>>>>>>File8 H 2 F 2
>>>>>>>File9 L 3 M 1
>>>>>>>File10 L 3 M 2
>>>>>>>File11 L 3 F 1
>>>>>>>File12 L 3 F 2
>>>>>>>File13 L 4 M 1
>>>>>>>File14 L 4 M 2
>>>>>>>File15 L 4 F 1
>>>>>>>File16 L 4 F 2
>>>>>>>
>>>>>>>
>>>>>>>This appears to be a slightly more complicated situation than the one
>>>>>>>proposed in the section 8.7 of the limma users guide (p.45) or by
>>>>>>>Jenny on this post:
>>>>>>>
>>>>>>>https://stat.ethz.ch/pipermail/bioconductor/2006-February/011965.html
>>>>>>>
>>>>>>>In particular, I am intersted in
>>>>>>>- Effect of "sex" (M vs F)
>>>>>>>- Interaction between "sex" and "phenotype ("line" nested)
>>>>>>>- Effect of "phenotype" in males
>>>>>>>- Effect of "phenotype" in females
>>>>>>>
>>>>>>>Line should be nested in phenotype, because they are random "strains"
>>>>>>>that happened to end up in phenotype H or L.
>>>>>>>
>>>>>>>Can I design this in limma? Is there a source of information about
>>>>>>>how to handle with this? In particular, can I design a single model
>>>>>>>matrix and then choose the contrasts I am interested in?
>>>>>>>
>>>>>>>Any help is much appreciated,
>>>>>>>paolo
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Paolo Innocenti
>>>>>>>Department of Animal Ecology, EBC
>>>>>>>Uppsala University
>>>>>>>Norbyv?gen 18D
>>>>>>>75236 Uppsala, Sweden
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioconductor mailing list
>>>>>>>Bioconductor at stat.math.ethz.ch
>>>>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>Search the archives:
>>>>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>Naomi S. Altman 814-865-3791 (voice)
>>>>>>Associate Professor
>>>>>>Dept. of Statistics 814-863-7114 (fax)
>>>>>>Penn State University 814-865-1348 (Statistics)
>>>>>>University Park, PA 16802-2111
>>>>>>
>>>>
>>>>--
>>>>Paolo Innocenti
>>>>Department of Animal Ecology, EBC
>>>>Uppsala University
>>>>Norbyv?gen 18D
>>>>75236 Uppsala, Sweden
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Douglas Lab
>University of Michigan
>Department of Human Genetics
>5912 Buhl
>1241 E. Catherine St.
>Ann Arbor MI 48109-5618
>734-615-7826
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list