[BioC] How to cope with arrays hybridized at significantly different time.

Fri Mar 13 21:27:57 CET 2009

Good points, I would say, remember three basic principles of 
experimental design:
1) Replication
2) Randomization
3) Blocking

If you have batch (or other "environmental") effects, you need multiple 
batches, with experimental conditions crossed with batches. Ideally, you 
want to randomize within batch and keep the within batch variation as 
controlled as possible. Also a complete block (where all experimental 
conditions are represented in all batches, ~batch=block~) is probably 
better. Then you have to account for the batch effect in the analysis, 
for example if you are using a linear mixed model to analyze expression, 
you should include a batch effect (random or fixed) in it, as it was 
suggested before.

Moreover, having repeats of the same experimental condition in each 
batch (example: multiple affected and control samples per batch), allows 
you to test for batch*condition interaction (and if that is 
significant... good luck with the interpretation...).

Even if you are working with "observational data" (meaning non-designed 
experiment), if you have many samples, you can probably account for some 
sources of variation. In that case, having good annotation of 
"environmental conditions" is a must.

If your model (for example clustering) can not account for multiple 
sources of variation, you may consider pre-whitening the data by 
adjusting a linear model with batch and other systematic effects first, 
then use the residuals from the model to do your clustering and see if 
the samples group together reflecting experimental conditions of interest.

Hope this helps.
Cheers,
JP

Michal Okoniewski wrote:
> Dear Triantafillos,
>
> Your question sounds like a serious problem in a real (clinical) 
> application of microarrays.
> To tell the truth, not many people have such big datasets, many are 
> not aware about sources
> of variability, especially  at the stage  of  RNA extraction, because 
> Affy hybridization itself
> most often do not add more variability than the extraction conditions 
> (patien's stress, sample
> degradation, habits and moods of the person who gathers the matherial 
> and extracts RNA).
> Anyway - there are some "rules of good practice" that could be 
> applied, eg
>
> * keep precise and detailed annotation of samples - then you can try 
> with anova to
> estimate the strength of influencing factors
> * try to extract RNA in the same/similar conditions - if it is not 
> possible, randomize extractions
> * use in the experiment as many replicates as you can afford :) * do 
> not pool unless you have really good reason  for it
> * define your goal and adjust the subset of your data and types of 
> analysis to it - eg if you need just an "expression signature"
> of 10-100 probesets, apply different methods and check how they 
> overlap to avoid false positives,
> if you need an answer to a "biological question" - use eg limma anova 
> with contrasts and play with pathways...
>
> The list is by far not complete, but I think it would be interesting 
> to discuss good practices in the
> applications of big microarray dataset - because this is the case 
> where the science becomes
> really directly applicable and useful...
>
> all the best,
> Michal
>
> Triantafillos Paparountas wrote:
>> Dear list,
>>
>> I would like to have your opinions on the following subject.
>>
>> In hospital-studies most of the time we get more than 200 arrays per
>> study.It is evident that the arrays have significant differences 
>> among them
>> due to different array batch and many other conditions ie technical
>> competence, hybridization difference due to time span , circadian 
>> rhythm ,
>> fresh sample or not->different time from RNA extraction to 
>> hybridization ,
>> and others. How can we cope with the many uncontrollable factors and 
>> be able
>> to use 80 , 200 or even a higher number of arrays at the same analysis
>> fixing for any of the uncontrollable effects.
>>
>> I am using mostly Affymetrix arrays , Hu133plus2 , MOE Gene 1 St , 
>> Moe 430 2
>> , and currently my favorite software apart from Bioconductor are 
>> Partek's
>> Gene Suite (which -at least according to the manual- can fix for
>> uncontrolled effects) , and Genespring due to the magnificent cluster
>> algorithm that incorporates.
>>
>> Thanks in advance.
>>
>> T. Paparountas
>> www.bioinformatics.gr
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

-- 
=============================
Juan Pedro Steibel

Assistant Professor
Statistical Genetics and Genomics

Department of Animal Science & 
Department of Fisheries and Wildlife

Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA 

Phone: 1-517-353-5102
E-mail: steibelj at msu.edu