[BioC] Merging microarray datasets

Thu Apr 24 12:55:29 CEST 2008

> Thanks to Adai and Eric,
>
> Well, I'm trying to bring back the discussion to the previous direction as
> it apparently went to a different area : cross-platform integration. :)
> I
> was wondering about integration within the same platform – an issue when
> there are multiple chips (in case of affymetrix)     OR        multiple
> print layouts (cDNA .gal files).
>
> "
 have to normalize all raw data together. All data should also be of one
> platform only. Then you can simply normalize all CEL files or all -----
> files together and be done."  [courtesy: Balasubramanian]
>
> So, if I have suppose MA1, MA2
 as respective normalised datasets (same
> platform. After doing normalization based on chip-types in case of
> Affymetrix, OR, print layouts in case of cDNA), can I just normalize them
> again for the final dataset, or I need to take care of some other issues
> too
> (how to tackle!) ? Also, wonder if there's any smart package in this
> regard!

I am no certain that there is a magic package that can do the best data
transformation for all situations.

You may well have to include a dataset effect into your analysis (as
Robert and others are recommending it), and there are many (smart)
packages available in R to help you build models and estimate effects.

>
> Also Eric, I didn't get you what project you were talking about : "
See
> for
> example the oncomine project
."

The project aims at bundling heterogeneous expression data together.
http://www.ncbi.nlm.nih.gov/pubmed/15068665?dopt=AbstractPlus

> Thanks.
>
> Kathy
>
>
>
>
> = = = =  = = = =
>
> On Thu, Apr 24, 2008 at 7:34 AM, Kort, Eric <Eric.Kort at vai.org> wrote:
>
>> > -----Original Message-----
>> > From: bioconductor-bounces at stat.math.ethz.ch
>> > Subject: Re: [BioC] Merging microarray datasets
>> >
>> >
>> > This is an interesting question and one that I like to
>> > explore further.
>> >
>> > The papers I have seen on combining microarray datasets so
>> > far select one algorithm for Affymetrix and one algorithm for cDNA.
>> >
>> > Has anyone investigated which combination of preprocessing
>> > algorithm(s) make data from these two platforms comparable?
>> > Indeed, how does one check if they are comparable? Any
>> > references and suggestions would be very welcome.
>>
>> Platforms aside, cDNA arrays are usually two color and ratios, and Affy
>> are one color and not ratios.
>>
>> So one approach is to turn everything into ratios after preprocessing
>> and
>> normalization (using, for example, a suitable set of  reference
>> samples...e.g. normal kidney for kidney tumors).  Obviously, thoughtful
>> selection of reference samples is required, and the reference chosen
>> should
>> be analagous to whatever was used as the reference in the two color
>> arrays.
>>
>> Then, one can try to bring things further into line by converting the
>> log
>> transformed ratios into z scores.
>>
>> As far as verification, a place to start would be box and whisker plots
>> to
>> expose obvious abnormalities.  You can also perform unsupervised
>> clustering
>> to see if the samples cluster mainly according to platform/lab or mainly
>> according to known phenotype.
>>
>> Then, as Robert Gentleman stated, you can use appropriate models to
>> correct systematic biases.  But I will leave the details of that to the
>> statisticians (and, indeed, the archives of this list).
>>
>> Obviously, the whole exercise is frought with difficulties, but it is
>> done.  See for example the oncomine project. Whether it is is fruitfully
>> done is open to argument. One thing to consider is to utilize
>> down-stream
>> analysis methods that care more about relative position of genes and
>> less
>> about magnitude of values (e.g. GSEA or PGSEA).
>>
>> >
>> > Thank you.
>> >
>> > Regards, Adai
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor