[BioC] block design and how it handles missing values in Limma
Gordon Smyth
smyth at wehi.edu.au
Tue Mar 1 04:26:57 CET 2005
At 01:49 PM 1/03/2005, xiaocui zhu wrote:
>Hello Gordon,
>
> > You seem to have a major problem with your analysis, which is that you
> > don't seem to have incorporated the dye swaps into your design matrix at
> > all. You define a vector call 'vector', but then make no use of it.
> > Perhaps
> > you should investigate the use of the functions in limma such as
> > modelMatrix() which use targets frame to construct design matrices.
>
>I am very sorry about the mistake in the code I gave. There should be a line
>" design<-design*vector " immediately following the definition of 'vector'.
>I missed the line when I was doing copy&paste between my existing script and
>the email.
>
> > As far as the 'block' argument is concerned, what makes you think that it
> > is "supposed" to return simple averages? There wouldn't be much point in
> > this functionality if it simply returned the same answers as you would get
> > without it.
>I assumed that the 'M' column output by the 'topTable' function returns the
>average of replicates. I found this was the case many times previously, so
>often so that I made the assumption and started to use the comparison
>between 'M' column of topTable output and the simple average to check if I
>have made any mistake in the codes. Apparently, I had jumped to the wrong
>conclusion. According to the definition of the 'topTable' function, "M is
>the estimate of the effect or the contrast, on the log2 scale". If you do
>not mind, I would like to hear how the calculation of this estimate is
>different from the simple averaging.
>
>If you haven't been completely turned off by my foolishness by now, I would
>like to ask for help with the two questions I posted in my first email:
>1) Is it appropriate to use block design in my case?
No it isn't appropriate here, for the reason explained in the "technical
replication" section of the Limma User's Guide.
>2) How does block design handles missing values of a dyeSwap pair? Why do
>I see that in a block design, if a pair of dyeSwap measurements has only
>one missing value for a feature, the M value of that feature does not equal
>to the average of the replicates, while the M value equals the average of
>replicates without the block factor?
Fitted values end up being simple means when the design is entirely balanced.
The estimation is by generalised least squares. The best reference is
http://www.bioinformatics.oupjournals.org/cgi/content/abstract/bti270v1
although here the blocking is on pairs of spots rather than blocks of arrays.
Gordon
>Thanks,
>
>Xiaocui
>
>
>
> > >Date: Sun, 27 Feb 2005 16:12:53 -0800
> > >From: "xiaocui zhu" <xzhu at caltech.edu>
> > >Subject: [BioC] block design and how it handles missing values in
> > > Limma
> > >To: <bioconductor at stat.math.ethz.ch>
> > >
> > >Hello all,
> > >
> > >I have a cDNA array data sets collected from a time-course experiment.
> > The
> > >experiment design was similar to the following:
> > > 1)Treat cells with ligands A, ligand B, or a vector control
> > > 2)Harvest cells at 1h and 2h
> > > 3)Measure expression changes in treated cells relative to
> > >time-matched-controls using 2-color cDNA arrays with a dyeSwap design
> > (each
> > >treated and time-matched-control sample pairs were hybridized onto two
> > >arrays with a dyeSwap).
> > >
> > >Step 1) to 3) were repeated three times, so that for each treatment
> > >condition, we have three biological and two technical repeats.
> > >
> > >I used Limma to identify differentially expressed genes in response to
> > each
> > >ligand, and genes differentially expressed in response to ligand A vs. to
> > >ligand B at each time point. Since each time the experiment was repeated,
> > >the cell preparation and other experimental conditions might vary
> > slightly,
> > >I thought that the data collected from one experiment can be considered a
> > >block to account for the batch variance. Parts of the codes taking into
> > >account the dyeSwap design and block factor are as the following:
> > >
> > >#Identify differentially expressed genes at each time point to each
> > ligand
> > >treatments <- factor(c(1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,
> > 4,4,4,4,4,4))
> > >vector<- c(1,-1,1,-1,1,-1, 1,-1,1,-1,1,-1, 1,-1,1,-1,1,-1,
> > >1,-1,1,-1,1,-1)
> > >design <- model.matrix(~ 0+treatments)
> > >colnames(design) <- c("A.1h","A.2h","B.1h","B.2h")
> > >fit<- lmFit(MA, design, block=c(rep(c(1,1,2,2,3,3),4)))
> > >efit <-eBayes(fit)
> > >for (i in 1:length(colnames(design))){
> > > output<-topTable(efit, coef = i, number=16200, adj="fdr")
> > > write.table(output, file = paste(colnames(design)[i], ".txt",
> > >sep=""), sep="\t")
> > > }
> > >
> > >When I examined the output files from the above codes, I was concerned
> > that
> > >the M value for some of the array features did not equal to the average
> > of
> > >the replicates, even though it's supposed to. This is only seen with
> > >features if a pair of its dyeSwap measurements had a "NA" value in only
> > one
> > >of the arrays. If both arrays of a dyeSwap pair gave a "NA" value for the
> > >feature, the M value would still be equal to the average of replicates as
> > >it's supposed to. This problem seemed to be caused by including the block
> > >factor in the lmFit statement, because no such inconsistency was found in
> > >the output if I removed the block factor. I don't know whether this
> > >inconsistency is due to some errors in my codes, or whether block design
> > >somehow handles missing value in a dyeSwap pair differently.
> > >
> > >My questions are:
> > >1) Is it appropriate to use block design in my case?
> > >2) How does block design handles missing values of a dyeSwap pair? Why do
> > I
> > >see that in a block design, if a pair of dyeSwap measurements has only
> > one
> > >missing value for a feature, the M value of that feature does not equal
> > to
> > >the average of the replicates?
> > >
> > >Any help to this matter will be greatly appreciated!
> > >
> > >Xiaocui
More information about the Bioconductor
mailing list