[BioC] analysis of HG_U95A vs. HG_U95Av2
rscharpf at jhsph.edu
Thu Jul 19 14:21:51 CEST 2007
On Jul 19, 2007, at 6:00 AM, bioconductor-request at stat.math.ethz.ch
> Alex Tsoi wrote:
>> Dear all,
>> I have a cancer dataset from GEO that labeled as having the
>> platform GPL 91
>> (HG-U95A), and when I use justRMA() to read the data, I realize
>> that the
>> GSMs are from HG_U95A and HG_U95Av2, and that gives me the error.
>> I could
>> separately analyze the data but I just want to ask if anyone has
>> or comments about the difference between the two platforms AND
>> could I seem
>> the data coming from one platform, and analyze them (eg. by using
>> RMA); of
>> course if that's the case I have to "make" R believe that they are
>> from only one platform. Or what's the most proper way to analyze
>> these kinds
>> of data ?
>> Greatly appreciate for the help and the comments
>> P.S.: this is a cancer dataset, with two types of disease state,
>> and each
>> type could be either come from the HG-U95A or HG_U95Av2
> This is a difficult problem since there are platform specific effects.
> For example, you might think that a probeset which is shared
> between the
> two platforms would be safe to compare, but unfortunately, it will
> behave slightly differently on one platform than on the other. Even
> though in theory this is measuring the same thing.
> You could start by just normalizing these two array types in separate
> pools. Then you could take probesets that are supposedly shared
> them and look to see how they are behaving in their respective
> conditions. In general, I expect you will find that shared
> probesets to
> move the same direction on each platform under your experimental
> conditions, but that you get different absolute results on one
> than on another for a given condition. In other words, both the
> condition and the platform will contribute to the overall signal. The
> easiest thing is always to look at one platform at a time, but if you
> *must* combine them, grab a statistician 1st to try and help you to do
> something sensible.
I would begin as Marc suggests, and then explore integrative
correlation as a way to identify a reproducible set in 2 or more
studies (platforms). See the MergeMaid package and the references
therein. Once you have identified a reproducible set, you may want
to explore the packages metaArray and GeneMeta.
More information about the Bioconductor