[BioC] analysis of HG_U95A vs. HG_U95Av2

Thu Jul 19 14:21:51 CEST 2007

On Jul 19, 2007, at 6:00 AM, bioconductor-request at stat.math.ethz.ch  
wrote:

> Alex Tsoi wrote:
>> Dear all,
>>
>> I have a cancer dataset from GEO that labeled as having the  
>> platform GPL 91
>> (HG-U95A), and when I use justRMA() to read the data, I realize  
>> that the
>> GSMs are from HG_U95A and HG_U95Av2, and that gives me the error.  
>> I could
>> separately analyze the data but I just want to ask if anyone has  
>> experience
>> or comments about the difference between the two platforms AND  
>> could I seem
>> the data coming from one platform, and analyze them (eg. by using  
>> RMA); of
>> course if that's the case I have to "make" R believe that they are  
>> coming
>> from only one platform. Or what's the most proper way to analyze  
>> these kinds
>> of data ?
>>
>> Greatly appreciate for the help and the comments
>>
>> P.S.: this is a cancer dataset, with two types of disease state,  
>> and each
>> type could be either come from the HG-U95A or HG_U95Av2
>>
>>
> This is a difficult problem since there are platform specific effects.
> For example, you might think that a probeset which is shared  
> between the
> two platforms would be safe to compare, but unfortunately, it will
> behave slightly differently on one platform than on the other.  Even
> though in theory this is measuring the same thing.
>
> You could start by just normalizing these two array types in separate
> pools.  Then you could take probesets that are supposedly shared  
> between
> them and look to see how they are behaving in their respective
> conditions.  In general, I expect you will find that shared  
> probesets to
> move the same direction on each platform under your experimental
> conditions, but that you get different absolute results on one  
> platform
> than on another for a given condition.  In other words, both the
> condition and the platform will contribute to the overall signal.  The
> easiest thing is always to look at one platform at a time, but if you
> *must* combine them, grab a statistician 1st to try and help you to do
> something sensible.

I would begin as Marc suggests, and then explore integrative  
correlation as a way to identify a reproducible set in 2 or more  
studies (platforms).  See the MergeMaid package and the references  
therein.  Once you have identified a reproducible set, you may want  
to explore the packages metaArray and GeneMeta.

Rob