[Bioc-devel] Making hypothesis testing easier with design matrices?

Tue Dec 11 06:41:34 CET 2012

Dear Gordon,

After a bit of pen-and-paper work, I see what you mean about additive 
models. I constructed a simple 2x2 additive model (i.e. "~a+b" where a 
and b each have 2 levels) and tried to solve for all 4 groups, and found 
that it was impossible. The best that can be done is solving for two out 
of the four, plus the mean of the other two. Clearly an interaction term 
would be required to resolve the other two. So I see that my proposal is 
indeed impossible to carry out in the general case, and in every case 
where it is possible, one may as well use a no-intercept parametrization 
and be done with it. Thanks for clarifying.

What about this more limited proposal? Suppose one is studying an 
additive model, but only one factor (or equivalently, one set of 
interacting factors) is of interest and the rest are blocking factors. 
For example, suppose the model is "~condition + donor", but donor is 
just a blocking factor and only condition is of interest. If one used 
the no-intercept formula "~0+condition+donor" and set "donor" to use 
sum-to-zero contrasts, then am I correct in thinking that the 
coefficients corresponding to levels of "condition" would then be usable 
as estimates of the average logCPM for each condition? If so, would 
these estimates be any better than simply computing logCPM individually 
for each sample and taking the mean of all the samples in each group?

Sincerely,
-Ryan

On Mon 10 Dec 2012 05:56:07 PM PST, Gordon K Smyth wrote:
>
> Dear Ryan,
>
> Thanks for your suggestion. I think though that the attribute that
> you are thinking of implementing is not actually something that exists
> in general.
...
> This is so only for one-way designs, i.e., for single factor experiments.
...
> For additive models however, I think there is no shortcut to a user
> trying to understand what the fitted coefficients represent.
>
> Best wishes
> Gordon