[BioC] limma design

Mon Jun 18 18:09:44 CEST 2007

Please quit taking things off-list. The list is intended to a.) help people directly, and b.) serve as a searchable resource. If you take things off-list you eliminate b.). 

As to the 'correct' way of doing things - there is no such thing. In statistics there are assumptions you make about the underlying data, and if you violate the assumptions then your results may not mean what you think. I wouldn't say it is 'incorrect' to violate assumptions, as in microarray data analysis we do it all the time. When was the last time you checked to see if the data you are fitting with these linear models followed a distribution even remotely Normal looking?

So the question isn't about the right way to do things. Instead, the question is 'What am I doing by fitting a model this way, and do I think that is reasonable/defensible?'. If you choose to analyze data, you are responsible for knowing what you did and why you did it, and defending your choices. I don't think you can reasonably expect someone to take that responsibility for you.

Best,

Jim

>>> Lev Soinov <lev_embl1 at yahoo.co.uk> wrote:
> Hi Jim,
>    
>   Thank you for the reply.
>   I understand about SSR now however, my question remains, i.e. what is the 
> correct way of doing this? So, should we analyse these treatments separately 
> or mix them together? We have the same cells here, only at different time 
> points of a treatment, and are interested only in differences between treated 
> and untreated samples and not in differences between time points.
>    
>   Thank you very much for your help!
>   Lev.
>   
> "James W. MacDonald" <jmacdon at med.umich.edu> wrote:
>   Lev Soinov wrote:
>> Dear Gordon and List,
>> 
>> I would very much appreciate your comment on the experiment design in
>> LIMMA. It is about processing of experiments with multiple
>> treatments.
>> 
>> Let's say we have a simple Affy experiment with 16 samples collected
>> from a cell line (treated/untreated) in two time points: - 4 treated,
>> 4 untreated - time point 1 - 4 treated, 4 untreated - time point 2 We
>> are interested in differential expression between treated and
>> untreated cells, in point1 and point2 separately. When we process all
>> samples together (normalise them together and fit linear fit models
>> using the whole dataset) in LIMMA we will get results different from
>> when we process data for points 1 and 2 separately (normalise them
>> together but fit liner models separately).
>> 
>> I do understand that it should be like this (more information for
>> priors), but I do not know whether there is some kind of a criterion
>> helping decide whether to process them separately or in one go. It
>> seems that adding more treatments into the mix increases statistical
>> power and thus, increases the number of genes classified as
>> differentially expressed. The latter seems a bit strange to me,
>> because the number of genes classified as differentially expressed in
>> one comparison (contrast) should not depend on the genes
>> differentially expressed in some other comparison (contrast).
> 
> Yes, but you are fitting a linear model and then computing contrasts in 
> one instance, and fitting two independent t-tests in the other. In the 
> former, your denominator will be based on the SSR from the linear model 
> (which is computed using data from _all_ samples, not just those being 
> compared). In the latter the denominator is based on just those samples 
> under consideration.
> 
> Best,
> 
> Jim
> 
> 

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.