[BioC] Merging microarray datasets
lgautier at altern.org
lgautier at altern.org
Thu Apr 24 12:55:29 CEST 2008
> Thanks to Adai and Eric,
> Well, I'm trying to bring back the discussion to the previous direction as
> it apparently went to a different area : cross-platform integration. :)
> was wondering about integration within the same platform an issue when
> there are multiple chips (in case of affymetrix) OR multiple
> print layouts (cDNA .gal files).
have to normalize all raw data together. All data should also be of one
> platform only. Then you can simply normalize all CEL files or all -----
> files together and be done." [courtesy: Balasubramanian]
> So, if I have suppose MA1, MA2
as respective normalised datasets (same
> platform. After doing normalization based on chip-types in case of
> Affymetrix, OR, print layouts in case of cDNA), can I just normalize them
> again for the final dataset, or I need to take care of some other issues
> (how to tackle!) ? Also, wonder if there's any smart package in this
I am no certain that there is a magic package that can do the best data
transformation for all situations.
You may well have to include a dataset effect into your analysis (as
Robert and others are recommending it), and there are many (smart)
packages available in R to help you build models and estimate effects.
> Also Eric, I didn't get you what project you were talking about : "
> example the oncomine project
The project aims at bundling heterogeneous expression data together.
> = = = = = = = =
> On Thu, Apr 24, 2008 at 7:34 AM, Kort, Eric <Eric.Kort at vai.org> wrote:
>> > -----Original Message-----
>> > From: bioconductor-bounces at stat.math.ethz.ch
>> > Subject: Re: [BioC] Merging microarray datasets
>> > This is an interesting question and one that I like to
>> > explore further.
>> > The papers I have seen on combining microarray datasets so
>> > far select one algorithm for Affymetrix and one algorithm for cDNA.
>> > Has anyone investigated which combination of preprocessing
>> > algorithm(s) make data from these two platforms comparable?
>> > Indeed, how does one check if they are comparable? Any
>> > references and suggestions would be very welcome.
>> Platforms aside, cDNA arrays are usually two color and ratios, and Affy
>> are one color and not ratios.
>> So one approach is to turn everything into ratios after preprocessing
>> normalization (using, for example, a suitable set of reference
>> samples...e.g. normal kidney for kidney tumors). Obviously, thoughtful
>> selection of reference samples is required, and the reference chosen
>> be analagous to whatever was used as the reference in the two color
>> Then, one can try to bring things further into line by converting the
>> transformed ratios into z scores.
>> As far as verification, a place to start would be box and whisker plots
>> expose obvious abnormalities. You can also perform unsupervised
>> to see if the samples cluster mainly according to platform/lab or mainly
>> according to known phenotype.
>> Then, as Robert Gentleman stated, you can use appropriate models to
>> correct systematic biases. But I will leave the details of that to the
>> statisticians (and, indeed, the archives of this list).
>> Obviously, the whole exercise is frought with difficulties, but it is
>> done. See for example the oncomine project. Whether it is is fruitfully
>> done is open to argument. One thing to consider is to utilize
>> analysis methods that care more about relative position of genes and
>> about magnitude of values (e.g. GSEA or PGSEA).
>> > Thank you.
>> > Regards, Adai
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
More information about the Bioconductor