[Bioc-devel] Making hypothesis testing easier with design matrices?

Tue Dec 11 10:59:54 CET 2012

Dear Gordon,

On 12/10/2012 11:06 PM, Gordon K Smyth wrote:
> I don't see a proposal below, only a question.

Yes, I ended up not really proposing anything because I realized that I 
didn't really have anything that improves on linear modeling. But see 
below for what I was trying to get at.

> What linear modelling programs such as limma and edgeR do is not 
> "simply computing logCPM individually for each sample and taking the 
> mean of all the samples in each group".
This what I thought, and I just wanted to make sure. Which brings me to...
> I think that it might be better for you to try to understand what 
> linear modelling is doing in a more sophisticated and complete manner 
> before trying to redesign the process.  What about reading a textbook 
> on linear modelling?
This is exactly what I will do.

However, I want to clarify that I'm not trying to redesign the process 
of linear modeling. I'm trying to find a middle ground between simple 
but inflexible two-group comparisons and flexible but confusing (for 
biologists that I work with, and sometimes for me) linear modeling in 
its full generality. I think my idea of a no-intercept factor of 
interest plus sum-to-zero blocking factors accomplishes that goal for my 
purposes. It is a subset of the full possibilities of (generalized) 
linear modelling, but one which (unlike simple two-group comparisons) 
encompasses every experimental design I've encountered with so far, and 
does so in a way that makes sense to my biologist collaborators, since 
each group mean gets its own coefficient in the design matrix.

Thank you, Gordon, for your patience with me as I stumble around with my 
limited understanding of linear modelling. I will try to find a good 
textbook on the subject and improve my understanding.

Sincerely,

-Ryan