[BioC] RE: Design matrix with multiple genotypes +
quantified variables (+cor/regression)
Gordon Smyth
smyth at wehi.edu.au
Thu Aug 26 09:20:38 CEST 2004
At 12:33 AM 24/08/2004, Matthew Hannah wrote:
>Again, sorry for initially posting without to much investigation, but
>lots on (haven't we all) and I was hoping someones experience could save
>me alot of time. So heres an update.
>
>There are 2 basic questions -
>1. Are the design and contrast matrices below correct? Is there a better
>way to design it. My hypothesis is that treatment N - treatment A will
>be similar between genotypes, but the genotypes will be different to
>each other. I'm looking for the global treatment contrast, but don't
>want the genotype differences getting in the way. Is this already taken
>care of in the design below or does the design need to be different. ie:
>is the lm contrast comparing (ConA, MutA, Mut2A) vs. (ConN, MutN, Mut2N)
>OR averaging(ConA-ConN, MutA-MutN, Mut2A-Mut2N).
>
>2. How is it best to compare a variable to find genes that correlate to
>it. I've done a fair bit on this now but still need some pointers. The
>obvious thing to do was a genewise pearson, however, In 'Intro stats
>with R' there is the statement - "The reader should be warned that there
>are many incorrect uses of correlation coefficients, particularly when
>they are used in regression-type settings". Well I'm duly warned but not
>sure on what a regression-type setting is. Also it seems that regression
>and pearson give the same result.
>
>For the correlation I used cor, and then it suggests to test that the
>correlation is significantly different from zero using cor.test. From
>comparing these it seems that there is a strict relationship between the
>p-value and pearson coefficient that only varies with sample number (#
>of arrays). The p-value just gives an indication of what pearson is
>significant - but surely you don't need to get it for all genes as it
>just seems to rely on sample #?
>
>So I then proceded with regression analysis using lm(). The output
>values that appear to be useful are p-value and Rsquared. The former is
>the same as from cor.test, and the later is the squared pearson
>coefficient, which I've just discussed. Am I missing something, or is
>there a better way?
>
>Finally as Limma uses lm functions can I do the regression using it, to
>provide access to the other tools such as eBayes, classifyTests or
>toptable. Or are they fundamentally different?
Yes, you can do the regressions using limma. No, the approaches are not
fundamentally different.
Gordon
>Thanks for your time,
>Matt
>
>
>-----Original Message-----
>From: Matthew Hannah
>Sent: Donnerstag, 19. August 2004 14:56
>To: 'bioconductor at stat.math.ethz.ch'
>Subject: Design matrix with multiple genotypes + quantified variables
>
>Hi,
>
>After asking before this design and contrast matrix was suggested and it
>worked well. But now it gets complicated?
>2 genotypes - Con, Mut
>2 treatments - A, N.
>4 replicates
>
>treatments <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4))
>design <- model.matrix(~ 0+treatments)
>colnames(design) <- c("ConA","ConN","MutA","MutN") fit <-
>lmFit(esetgcrma, design)
>
>cont.matrix <- makeContrasts(ConA-MutA, ConN-MutN,
>Gen=(ConN+ConA-MutN-MutA)/2, ConA-ConN, MutA-MutN,
>treatment=(ConA+MutA-ConN-MutN)/2,levels=design)
>con.fit <- contrasts.fit(fit, cont.matrix)
>
>So what if I add a third genotype - Mut2?
>Is it the obvious add treatments <- .....5,5,5,5,6,6,6,6)) and then for
>the contrasts treatment=(ConA+MutA+Mut2A-ConN-MutN-Mut2N)/3)
>Or am I misunderstanding how to design contrasts? Is there an easier way
>of writing this when you have more genotypes?
>
>Also logically the lm is treating all samples as independent when they
>are not, does this matter? Is it possible to fit the original lm using a
>design taking genotype and treatment into account? Would this be a
>better approach, especially as if you have more genotypes (eg:5-10).
>What would the design matrix then look like?
>
>Finally, what if you have a quantified variable for each genotype like a
>measure of growth before and after the treatment. Can you specify this
>in anyway (in the design matrix?) so you take this into account during
>the fit. I thought this was possible using lm or rlm, or am I confusing
>something? Alternatively, does anyone have a different approach, such as
>an efficient way of doing a gene-by-gene regression or correlation
>analysis against the growth measure, and extracting the genes that
>correlate best with the growth measure?
>
>Perhaps there is there a good (biologist simple?) book that would cover
>design and contrast of lms, anyone know of one?
>
>Thanks again,
>Matt
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
More information about the Bioconductor
mailing list