[BioC] RMA with few arrays

Fri Sep 29 22:29:27 CEST 2006

Hi Ann,

Ann Hess wrote:
> I am wondering if it is appropriate to compute expression indices with RMA 
> for only a small number of arrays (2 or 3).  I have seen in previous posts 
> that the expression indices can be identical for some probe sets across 
> different arrays when using RMA with few arrays (because of the median 
> polish algorithm).  I have observed this for the data in question.  Is is 
> appropriate to just remove those probe sets from down stream analysis?  Is 
> there a problem with computing RMA for such a small group of arrays?

The parameter estimate that is probably not very good in this scenario 
is the probe effect, which is going to be ignored anyway. So is it a 
Really Good Thing? Not really. However, I'm not sure that any other 
method of computing expression values is going to excel in this 
situation, so what are ya gonna do? In a perfect world you would have 
hog-tied the Biologist until (s)he agreed to run more duplicates. ;-D

> 
> The data was generated by a scientist interested in comparing three 
> treatments.  However, they ran a single replicate of each treatment and 
> then reproduced the experiment at a later date.  So, there are a total of 
> 6 arrays, but they come from two separate experiments.  Currently, they 
> are just interested in comparing two of the treatments.
> 
> My plan was to run RMA for each of the experiments separately (since I 
> don't think it is appropriate to normalize all the arrays together).  Then 
> combine the results and use a paired t-test to test for differential gene 
> expression for the two treatment groups of interest.

I'm not convinced that you should run rma separately. I would first look 
at the raw data and see if it looks OK to run them all together. A quick 
look at a density plot for each chip is a good start, then you might try 
fitPLM() in affyPLM and look at the results of nuse(pset) and RLE(pset).

Doing a plot of the first two principal components wouldn't hurt either.

If the boxplots that result from nuse() and RLE() look reasonable, the 
density plots line up relatively close, and the three sample types group 
sorta close on a PCA plot, then I would run them all together and be 
happy that you don't have to wrangle with batch effects.

> 
> When I tried this approach (using limma and multtest), the results 
> actually looked good until I took a closer look at the RMA values and 
> noticed the identical expression indices across arrays for some probe 
> sets.

The identical expression values are more of an artifact than something 
to be concerned about. Dollars to donuts, if you run all six together, 
these same probesets will have super low variance and won't come up as 
significant anyway (and will likely be filtered out if you filter based 
on variance or IQR or some such thing).

Best,

Jim

> 
> Any suggestions would be greatly appreciated.
> 
> Ann
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.