[BioC] limma - interpreting factorial design

Tue Feb 24 19:05:53 CET 2009

Hi Bjoern

Thanks again for taking the time to reply.

>if you are just concerned about the numerical values,

I am not concerned specifically with the numerical values. I am just looking at them to make sure I am interpreting and understanding correctly.

>just take the equations and "interpret" them:

That is difficult if you are not quite sure what you are looking at. 

>Thus, the estimated coefficient in example 2 is a quarter of that in 
>example 1. (And the interaction effect should be (Mu.S-Mu.U)-(WT.S-WT.U) )

So I multiply it by 4 to show they are equivalent. Check!

>However the grand mean is already directly estimated. (So there is no 
>need to multiply it by four. 

OK, maybe we are getting close to my problem. What do you mean it is already directly estimated? Both the interaction and the grand mean are shown to be divided by 4 in the comparisons. How do you know that the first coefficient does not have to be multiplied by 4? But the 4th coefficient does? I can see that by looking at the actual numerical figures but I cannot get that from the documentation. What about coefficients 2 and 3? Are they directly estimated? 

>Again try interpreting the equation given)

Do you mean the (WT.U+WT.S+Mu.U+Mu.S)/4? 

That is what I am trying to interpret. This says to me that the coefficient (divided or multiplied or left as it is) will give the grand mean. How do I work out which to do? It does not match what we did with the interaction above.    

If I set up a contrast matrix like this that would extract the "directly estimated" grand mean as a contrast and the "not directly estimated" interaction, but why?

contrast.matrix<-cbind(gm=c(1,0,0,0), dp=c(0,1,0,0), TNF=c(0,0,1,0), Interaction=c(0,0,0,4))

>Otherwise "The R book" has a good section on contrasts.

I'll have a look, and at the statsoft.

>But it would be best to have a look at linear models and 
>parameterizations first.

I am doing but it does not seem to help with this question. As I said I have run anova and regressions against the data and it all comes out roughly as expected. But I must be having a blind spot about this divide by 4. 

>Potentially, it would be helpful to add a few comments to the 
>limmaUsersguide here?

I saw an email from Gordon Smyth in the mailings that said he would like feedback on this section. 

Regards

John

---

-----Original Message-----
From: Bjoern Usadel [mailto:usadel at mpimp-golm.mpg.de] 
Sent: 24 February 2009 16:29
To: john seers (IFR)
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] limma - interpreting factorial design

Hi John,

if you are just concerned about the numerical values, you can really 
just take the equations and "interpret" them:
Interaction effect
example 1:
(Mu.S-Mu.U)-(WT.S-WT.U)

example 2:
(WT.U-WT.S-Mu.U+Mu.S)/4

Thus, the estimated coefficient in example 2 is a quarter of that in 
example 1. (And the interaction effect should be (Mu.S-Mu.U)-(WT.S-WT.U) )

However the grand mean is already directly estimated. (So there is no 
need to multiply it by four. Again try interpreting the equation given)

But it would be best to have a look at linear models and 
parameterizations first.

e.g.
http://www.statsoft.com/textbook/stglz.html
Otherwise "The R book" has a good section on contrasts.

If you didn't want to pursue that further: use approach 1 in the limma 
guide, as this is usually the easiest one and helps you formulating the 
question you really want.

Potentially, it would be helpful to add a few comments to the 
limmaUsersguide here?

HTH
Björn

john seers (IFR) wrote:
> Hi Bjoern
> 
> Thanks for the reply.
> 
> I am following the example on page 47 exactly, the only difference being using dp as Strain and TNF as Treatment. 
> 
> Here are my factors which gives you which measurements correspond to which treatment:
> 
>> dp
>  [1] Yes Yes Yes No  No  No  Yes Yes Yes No  No  No 
> Levels: No Yes
> 
>> TNF
>  [1] No  No  No  No  No  No  Yes Yes Yes Yes Yes Yes
> Levels: No Yes
> 
>> If you then compare these values with the ones you really want to 
>> extract you can come up with some simple transformations to do so.
> 
> I have not got to that stage yet of what I "really" want to extract. I am trying to understand exactly why these two approaches are equivalent and what the figures actually represent.  
> 
>> In your example you also seem to extract different things from the 
>> treatment-contrast parametrization than from the sum to zero 
>> parametrization.
> 
> In both cases I am extracting the major/primary coefficients and seeing how they relate. So they will be different. I am not extracting anything specific yet. I am having trouble with a description of a coefficient that is described as the "Grand mean" but is 4 times too big for what I think of as a Grand mean.
> 
> The only directly comparable coefficient in these two approaches is the interaction and they are the same in the example. (If multiplied by 4). So, assuming it is correct to multiply by 4 what is the interpretation of the Grand mean coefficient at 18.9249361? If it is not correct to multiply by 4 what is the interpretation of an interaction coefficient that is 4 times smaller than the treatment contrasts coefficient?
> 
> I have run an anova on this gene and with a bit of fiddling I can derive all the figures supplied by limma in both approaches and how they are linked. Except for when they should be 4 times bigger or 4 times smaller.  
> 
> 
> 
> 
> Regards
> 
> John
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> ---
> 
> -----Original Message-----
> From: Bjoern Usadel [mailto:usadel at mpimp-golm.mpg.de] 
> Sent: 24 February 2009 12:18
> To: john seers (IFR)
> Subject: Re: [BioC] limma - interpreting factorial design
> 
> Dear John,
> 
> could you please also post
> which of your measurements correspond to which treatment?
> 
> What helps a lot in interpretation is regrouping the terms on page 47 of 
> the user guide e.g. (WT.U-WT.S+Mu.U-Mu.S)/4 and then comparing these to 
> other contrasts or the contrast of interest.
> If you then compare these values with the ones you really want to 
> extract you can come up with some simple transformations to do so.
> 
> In your example you also seem to extract different things from the 
> treatment-contrast parametrization than from the sum to zero 
> parametrization.
> 
> contrast.matrix<-cbind(Intercept=c(1, 0, 0, 0), dp=c(0,1,0,0),
> TNF=c(0,0,1,0), Interaction=c(0,0,0,1))
> 
> If tnf is a factor exactly like in the limma example would most likely 
> not extract the TNF main effect.
> Also the intercept has a different meaning which might cause the 
> differences.
> 
> Best Wishes,
> Björn
> 
> john seers (IFR) wrote:
>> Hello All
>>
>> Can someone help me with unravelling a bit of confusion I have about the
>> limma factorial design?
>>
>> 8.7 Factor Designs (Page 47 approx)  in the user guide has three
>> approaches that are basically equivalent. I am comparing the "sum to
>> zero" and the "treatment contrast" approaches. In the sum to zero
>> approach the comparisons are divided by 4 and this is where my
>> misunderstanding lies.
>>
>> Just looking at the first gene as an example. I have put the expression
>> values below to give an idea of the magnitudes. 
>>
>> With the treatment contrast just extracting the coefficients straight I
>> get the following (code below):
>>
>> eb$coef[1,]
>> #  Intercept          dp         TNF Interaction 
>> # 4.84942088  0.05031631 -0.36610669  0.15883329
>>
>> With the sum to zero the comparisons are divided by 4. So one way to
>> extract the coefficients is below in the code. Using this way (in effect
>> multiplying by 4) I get the following:
>>
>> eb$coef[1,]
>> #         gm          dp         TNF Interaction 
>> # 18.9249361  -0.2594659   0.5733801   0.1588333
>>
>> So here is my problem. The grand mean looks 4 times too large but the
>> interaction matches the interaction from the treatments contrast
>> approach. So I can have one "looking" right but not both. i.e. To
>> multiply by 4 or not to multiply by 4, that is the question. How do I
>> interpret this? What am I missing in my understanding?
>>
>> Thanks for any help
>>
>>
>> Regards
>>
>> John
>>
>>
>> # Sum to zero code
>>
>> fit<-lmFit(eset, design)
>> contrast.matrix<-cbind(gm=c(4,0,0,0), dp=c(0,4,0,0), TNF=c(0,0,4,0),
>> Interaction=c(0,0,0,4)) 
>> #contrast.matrix<-cbind(Interaction=c(0,0,-2,-2)) 
>> fit2<-contrasts.fit(fit, contrast.matrix)
>> eb<-eBayes(fit2)
>>
>>
>> # Treatment contrasts code
>> design<-model.matrix(~dp*TNF) 
>> fit<-lmFit(eset, design)
>> contrast.matrix<-cbind(Intercept=c(1, 0, 0, 0), dp=c(0,1,0,0),
>> TNF=c(0,0,1,0), Interaction=c(0,0,0,1))
>>
>>
>> # Gene 1 expression level
>>
>> exprs1<-exprs[1,]
>> #     4.865401      5.114202      4.719609      4.882969      4.857923 
>> #     4.807370      4.538509      4.759865      4.779017      4.430844 
>> #     4.519123      4.499975
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> .
>>
> 

-- 
--------------------------------------------------
Björn Usadel, PhD
Max Planck Institute of Molecular Plant Physiology
AG Integrative Carbon Biology
Am Muehlenberg 1
14476 Potsdam-Golm
Tel.: +49 331 5678153
email usadel at mpimp-golm.mpg.de
http://tinyurl.com/IntegrativeCarbonBiology