[BioC] Design matrix for simple time course
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Mon Mar 6 11:09:41 CET 2006
Thanks for the information - very clear and succinct :) I understand
the difference between the models, just not how the differently
structured design matrices related to them.
From: James W. MacDonald [mailto:jmacdon at med.umich.edu]
Sent: 03 March 2006 18:18
To: michael watson (IAH-C)
Subject: Re: [BioC] Design matrix for simple time course
michael watson (IAH-C) wrote:
> I am trying to create a design matrix for a simple, one-channel
> time-course experiment.
> I have five time points with three replicated arrays at each time
> I want to set up the design matrix.
> I tried using:
> Vaguely following the tutorial here
> However, I only have one factor to model, time.
> The matrix that comes out as the first column all of ones, the
> intercept. What I (think) I want is the first column to have three
> 1's and the rest 0's.
> I guess I'm really struggling as I don't know what the difference is
> between the output of model.matrix, with an Intercept column of all
> 1's, and the design matrix I want, which has a first column of three
> 1's at the top and the rest 0's.
This is a problem. If you are trying to analyze your data using a
sophisticated tool like limma but you don't understand the models you
are fitting, I would venture to say that you are putting the cart before
the horse. I would strongly recommend either finding a local
statistician who is willing to sit down with you and explain the
difference between a cell means and factor effects ANOVA model, or at
the very least perusing a textbook that covers these topics.
I would recommend something like 'Applied linear statistical models' by
Neter, Kutner, Nachtsheim and Wasserman, which gives many clear examples
and is highly approachable.
As a start, here is the basic difference between the two models. In a
factor effects model (the one with an intercept, given by all 1's in the
first column), the intercept term represents one time point (in this
case, the 1st timepoint), and all of the other four terms represent the
*difference* between the given timepoint and the first (e.g., time2 -
time1, time3 - time1, etc). In this scenario you might not need a
contrast matrix if these are the comparisons you are interested in. If
you want other comparisons then you have to do the algebra to figure out
the correct contrast matrix.
In a cell means model, you are estimating the mean expression at each
timepoint, so you have to set up explicit contrasts to do whatever
comparisons you are interested in. As Ben Bolstad already noted, you fit
this model by adding a -1 (or a 0) to your call to model.matrix().
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
James W. MacDonald, M.S.
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
Ann Arbor MI 48109
More information about the Bioconductor