[BioC] How to cope with arrays hybridized at significantly different time.
Christos Hatzis
christos.hatzis at nuverabio.com
Fri Mar 13 22:07:34 CET 2009
The problem with the type of studies described in the original post is that
you don't really have control over the design and thus experimental design
principles are not helpful. The best approach might be to apply a simple
normalization to all arrays and try to model potential batch effects through
some meta-analytic method.
Robert Gentleman, among others, has done considerable work in this area,
which might serve as a starting point:
http://bioconductor.org/packages/2.3/bioc/vignettes/GeneMeta/inst/doc/GeneMe
ta.pdf
http://www.bepress.com/bioconductor/paper8/
-Christos
Christos Hatzis, Ph.D.
Nuvera Biosciences, Inc.
400 West Cummings Park
Suite 5350
Woburn, MA 01801
Tel: 781-938-3830
www.nuverabio.com
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
> Juan Pedro Steibel
> Sent: Friday, March 13, 2009 4:28 PM
> To: Michal Okoniewski
> Cc: Triantafillos Paparountas; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] How to cope with arrays hybridized at
> significantly different time.
>
> Good points, I would say, remember three basic principles of
> experimental design:
> 1) Replication
> 2) Randomization
> 3) Blocking
>
> If you have batch (or other "environmental") effects, you
> need multiple batches, with experimental conditions crossed
> with batches. Ideally, you want to randomize within batch and
> keep the within batch variation as controlled as possible.
> Also a complete block (where all experimental conditions are
> represented in all batches, ~batch=block~) is probably
> better. Then you have to account for the batch effect in the
> analysis, for example if you are using a linear mixed model
> to analyze expression, you should include a batch effect
> (random or fixed) in it, as it was suggested before.
>
> Moreover, having repeats of the same experimental condition
> in each batch (example: multiple affected and control samples
> per batch), allows you to test for batch*condition
> interaction (and if that is significant... good luck with the
> interpretation...).
>
> Even if you are working with "observational data" (meaning
> non-designed experiment), if you have many samples, you can
> probably account for some sources of variation. In that case,
> having good annotation of "environmental conditions" is a must.
>
> If your model (for example clustering) can not account for
> multiple sources of variation, you may consider pre-whitening
> the data by adjusting a linear model with batch and other
> systematic effects first, then use the residuals from the
> model to do your clustering and see if the samples group
> together reflecting experimental conditions of interest.
>
> Hope this helps.
> Cheers,
> JP
>
>
>
> Michal Okoniewski wrote:
> > Dear Triantafillos,
> >
> > Your question sounds like a serious problem in a real (clinical)
> > application of microarrays.
> > To tell the truth, not many people have such big datasets, many are
> > not aware about sources of variability, especially at the
> stage of
> > RNA extraction, because Affy hybridization itself most often do not
> > add more variability than the extraction conditions
> (patien's stress,
> > sample degradation, habits and moods of the person who gathers the
> > matherial and extracts RNA).
> > Anyway - there are some "rules of good practice" that could be
> > applied, eg
> >
> > * keep precise and detailed annotation of samples - then
> you can try
> > with anova to estimate the strength of influencing factors
> > * try to extract RNA in the same/similar conditions - if it is not
> > possible, randomize extractions
> > * use in the experiment as many replicates as you can
> afford :) * do
> > not pool unless you have really good reason for it
> > * define your goal and adjust the subset of your data and types of
> > analysis to it - eg if you need just an "expression signature"
> > of 10-100 probesets, apply different methods and check how they
> > overlap to avoid false positives, if you need an answer to a
> > "biological question" - use eg limma anova with contrasts and play
> > with pathways...
> >
> > The list is by far not complete, but I think it would be
> interesting
> > to discuss good practices in the applications of big microarray
> > dataset - because this is the case where the science becomes really
> > directly applicable and useful...
> >
> > all the best,
> > Michal
> >
> > Triantafillos Paparountas wrote:
> >> Dear list,
> >>
> >> I would like to have your opinions on the following subject.
> >>
> >> In hospital-studies most of the time we get more than 200
> arrays per
> >> study.It is evident that the arrays have significant differences
> >> among them due to different array batch and many other
> conditions ie
> >> technical competence, hybridization difference due to time span ,
> >> circadian rhythm , fresh sample or not->different time from RNA
> >> extraction to hybridization , and others. How can we cope with the
> >> many uncontrollable factors and be able to use 80 , 200 or even a
> >> higher number of arrays at the same analysis fixing for any of the
> >> uncontrollable effects.
> >>
> >> I am using mostly Affymetrix arrays , Hu133plus2 , MOE Gene 1 St ,
> >> Moe 430 2 , and currently my favorite software apart from
> >> Bioconductor are Partek's Gene Suite (which -at least according to
> >> the manual- can fix for uncontrolled effects) , and
> Genespring due to
> >> the magnificent cluster algorithm that incorporates.
> >>
> >> Thanks in advance.
> >>
> >> T. Paparountas
> >> www.bioinformatics.gr
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
>
> --
> =============================
> Juan Pedro Steibel
>
> Assistant Professor
> Statistical Genetics and Genomics
>
> Department of Animal Science &
> Department of Fisheries and Wildlife
>
> Michigan State University
> 1205-I Anthony Hall
> East Lansing, MI
> 48824 USA
>
> Phone: 1-517-353-5102
> E-mail: steibelj at msu.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list