[BioC] How to cope with arrays hybridized at significantly different time.
Juan Pedro Steibel
steibelj at msu.edu
Fri Mar 13 21:27:57 CET 2009
Good points, I would say, remember three basic principles of
experimental design:
1) Replication
2) Randomization
3) Blocking
If you have batch (or other "environmental") effects, you need multiple
batches, with experimental conditions crossed with batches. Ideally, you
want to randomize within batch and keep the within batch variation as
controlled as possible. Also a complete block (where all experimental
conditions are represented in all batches, ~batch=block~) is probably
better. Then you have to account for the batch effect in the analysis,
for example if you are using a linear mixed model to analyze expression,
you should include a batch effect (random or fixed) in it, as it was
suggested before.
Moreover, having repeats of the same experimental condition in each
batch (example: multiple affected and control samples per batch), allows
you to test for batch*condition interaction (and if that is
significant... good luck with the interpretation...).
Even if you are working with "observational data" (meaning non-designed
experiment), if you have many samples, you can probably account for some
sources of variation. In that case, having good annotation of
"environmental conditions" is a must.
If your model (for example clustering) can not account for multiple
sources of variation, you may consider pre-whitening the data by
adjusting a linear model with batch and other systematic effects first,
then use the residuals from the model to do your clustering and see if
the samples group together reflecting experimental conditions of interest.
Hope this helps.
Cheers,
JP
Michal Okoniewski wrote:
> Dear Triantafillos,
>
> Your question sounds like a serious problem in a real (clinical)
> application of microarrays.
> To tell the truth, not many people have such big datasets, many are
> not aware about sources
> of variability, especially at the stage of RNA extraction, because
> Affy hybridization itself
> most often do not add more variability than the extraction conditions
> (patien's stress, sample
> degradation, habits and moods of the person who gathers the matherial
> and extracts RNA).
> Anyway - there are some "rules of good practice" that could be
> applied, eg
>
> * keep precise and detailed annotation of samples - then you can try
> with anova to
> estimate the strength of influencing factors
> * try to extract RNA in the same/similar conditions - if it is not
> possible, randomize extractions
> * use in the experiment as many replicates as you can afford :) * do
> not pool unless you have really good reason for it
> * define your goal and adjust the subset of your data and types of
> analysis to it - eg if you need just an "expression signature"
> of 10-100 probesets, apply different methods and check how they
> overlap to avoid false positives,
> if you need an answer to a "biological question" - use eg limma anova
> with contrasts and play with pathways...
>
> The list is by far not complete, but I think it would be interesting
> to discuss good practices in the
> applications of big microarray dataset - because this is the case
> where the science becomes
> really directly applicable and useful...
>
> all the best,
> Michal
>
> Triantafillos Paparountas wrote:
>> Dear list,
>>
>> I would like to have your opinions on the following subject.
>>
>> In hospital-studies most of the time we get more than 200 arrays per
>> study.It is evident that the arrays have significant differences
>> among them
>> due to different array batch and many other conditions ie technical
>> competence, hybridization difference due to time span , circadian
>> rhythm ,
>> fresh sample or not->different time from RNA extraction to
>> hybridization ,
>> and others. How can we cope with the many uncontrollable factors and
>> be able
>> to use 80 , 200 or even a higher number of arrays at the same analysis
>> fixing for any of the uncontrollable effects.
>>
>> I am using mostly Affymetrix arrays , Hu133plus2 , MOE Gene 1 St ,
>> Moe 430 2
>> , and currently my favorite software apart from Bioconductor are
>> Partek's
>> Gene Suite (which -at least according to the manual- can fix for
>> uncontrolled effects) , and Genespring due to the magnificent cluster
>> algorithm that incorporates.
>>
>> Thanks in advance.
>>
>> T. Paparountas
>> www.bioinformatics.gr
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
=============================
Juan Pedro Steibel
Assistant Professor
Statistical Genetics and Genomics
Department of Animal Science &
Department of Fisheries and Wildlife
Michigan State University
1205-I Anthony Hall
East Lansing, MI
48824 USA
Phone: 1-517-353-5102
E-mail: steibelj at msu.edu
More information about the Bioconductor
mailing list