[BioC] Re: Design of Experiments

Gordon Smyth smyth at wehi.edu.au
Fri Aug 8 10:58:13 MEST 2003


At 06:47 AM 8/08/2003, Dave Waddell wrote:
>The problem I have is that I have walked in halfway through an
>experiment and I'm reminded of Fisher's 1938 statement
>"To call in the statistician after the experiment is done may be no more
>than asking him to perform a post-mortem examination: he may be able to
>say what the experiment died of." (Shamelessly borrowed from a relevant
>Nature article "DNA microarrays: Vital statistics" at
>http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n69
>49/full/424610a_fs.html )
>
>That aside, here is my immediate problem. I have data from an experiment
>where all cancerous tissue is on Cy3 and all normal tissue on Cy5 so the
>design matrix looks like this for six arrays (1st problem - no dye swap
>experiments):
>design <- cbind(rep(0,6),rep(1,6))

Have you read the LIMMA User's Guide, especially the section on the swirl 
data set? You will see that for a simple replicated design like this the 
design matrix has only one column, design <- rep(1,6). In fact you don't 
need to define a design matrix at all, lm.series will construct it for you. 
I don't understand at all why you are defining a design matrix with a 
column of zeros, but nevermind, lm.series should just ignore it.

>ma <- read.marrayRawNH(fnames, path="C:/Temp", sep="", name.Rf="Rf",
>name.Gf="Gf", name.Rb="Rb", name.Gb="Gb", gnames = genes, layout =
>layout, targets = maTargets, header=F)
>
>maN <- maNorm(ma, norm="p")
>
>lmset <- maM(maN)

Have you checked that you have actually read in the data at this stage, 
e.g., summary(lmset)?

>lmser<-lm.series(lmset,design=design,ndups=3,spacing=3)

These must be very unusual arrays to have 3 duplicates at a spacing of 3!

>eb=maBayesian(lmset,design=design,ndups=3,spacing=3)

Well you may as well not have used lm.series, because you're giving 
maBayesian the original data matrix 'lmset' instead of the output from 
lm.series which is 'lmser'. But why are you using maBayesian from the 
marrayTools package (which calls ebayes) instead of using ebayes itself?? I 
can't help you with maBayesian, you need to ask Jean.

>This returns a list with NAs in the first half of lmser and eb
>Lmser.$coefficients
>    [1]            NA            NA            NA            NA
>lmser.$stdev.unscaled
>    [1]        NA        NA        NA        NA        NA
> > eb
>$s2.prior
>   [1] 0.3825806        NA        NA        NA        NA        NA
>and so on.
>
>After that all hell breaks loose i.e. in
>toptable(lmser,number=30,genelist=genenb)
>Error in if (any(pos)) { : missing value where TRUE/FALSE needed
>
>To my understanding, the design matrix corresponds to a categorization
>of some kind, in this case cancerous and normal, and the actual values
>assigned to the categories are nominal.

Have you read the LIMMA User's Guide?

>The reason I asked about control spots refers back to an earlier email
>(the black holes concerning negative controls being less than
>background) and it would be nice to be able to normalize across arrays
>using negative controls.

I don't know what you have in mind here. Have you followed the recent 
series of emails on using weights?

>  I do subset and remove control spots (+ve, -ve,
>blanks, whatever) later since it can't be done until after any operation
>involving unwrapdups() because it will fail if ncols / ndups / spacing
>is not a whole number.
>
>Thanks, Dave.
>
>BTW, read.marrayRawNH() is my version which accepts files with no
>headers.
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: Thursday, July 31, 2003 9:10 PM
>To: Dave Waddell
>Cc: BioC Mailing List
>Subject: Re: Spelling mistakes and some questions re limma
>
>Dear Dave,
>
> >Can you point me to a place that would more fully explain the design
> >matrix and contrasts with respect to 2-colour dye experiments?
>
>My best suggestion at this time is:
>
>Yang, Y. H., and Speed, T. P. (2003). Design and analysis of comparative
>
>microarray experiments. In T. P. Speed (ed.), Statistical Analysis of
>Gene
>Expression Microarray Data. Chapman & Hall/CRC Press, pages 35-91.
>
>Thanks, I ordered the book, Dave.
>
>But basically limma is breaking new ground here so there are no good
>references for this stuff apart from the User's Guide itself. I am
>working
>on providing more user friendly interfaces to create design and contrast
>
>matrices and more documentation, but obviously these things take time.
>In
>the meantime, a local statistician would be able to give you some help.
>Or
>you could ask for help on bioconductor about specific designs.
>
> >  In some Bioconductor packages, the design matrix appears to be
> > applicable to the Cy3/Cy5 experiment as a whole and in others to the
> > individual Cy3 and Cy5 experiments.
>
>I am not clear what you mean here. As far as I know, limma is the only
>package to have the concept of a design matrix and limma is designed to
>analyze the whole experiment at once. Other packages basically assume
>you
>are making only one comparison usually with replicate arrays.
>
> >  It is very confusing. In addition, the meaning of a contrasts matrix
>and
> > how to put one together is not very clear. Both of these values, if
> > applied incorrectly, would appear to me (as a non-statistician
>assigned
> > to put together a package) to completely change the results.
>
>Yes, this is true.
>
> >  Finally, can you tell me how limma handles control spots?
>
>The only explicit handling of control spots in limma is in the plotMA
>function. I assume that you will leave the control spots in during the
>normalization (perhaps using weights to downweight ratio controls spots
>or
>to upweight MSP titration spots) and you will remove them before doing
>inference about differential expression. There are subsetting commands
>to
>make removing control spots easy.
>
> >Thanks for a great package, Dave.
>
>Thanks for your comments.
>
>Gordon



More information about the Bioconductor mailing list