[BioC] RMA/median polish question

Thu Sep 25 16:24:41 MEST 2003

I agree that the algorithm may converge to different parameter estimates
depending on how you start. However, is one way considered to be more
correct than the other? I sort of assumed that Tukey did it that way
because most of the matrices he was working with had many more columns
than rows.

I never really thought that less than 5-6 chips was enough to estimate
both chip and probe effect parameters, but Rafael Irizarray told me
recently (on this listserv) that the ROC curves in one of his recent
papers came from running rma on only two chips! I figure if he can
publish data using only two chips, I should be able to get away with
using three ;-D

Does anybody else have a thought on the number of chips required for
accurate parameter estimation?

Best,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> Ben Bolstad <bolstad at stat.berkeley.edu> 09/25/03 03:10PM >>>
Two points here:

1. One of the "features" of the median polish algorithm is that it may
converge to different parameter estimates if you sweep rows or columns
first. By convention I have been following the same order as
implemented
in the medpolish function (this ensures exact agreement between
expresso(), rma(), justRMA() implementations of RMA)

2. I am not really sure whether 3 arrays is really sufficient for
estimating both probe and chip effect parameters in the RMA context.

Ben

On Thu, 2003-09-25 at 11:31, James MacDonald wrote:
> Hi All,
> 
> I have a question about the implementation of medpolish in RMA. The
> algorithm involves repeated subtractions of row medians and column
> medians from a matrix of probe intensity values. 
> 
> The problem I have noticed is that if you have an odd number of
chips
> (especially if you only have three chips), you will end up with an
> inordinantly high percentage of expression values that are identical
in
> all three chips. We are talking about 25-30% of the genes when using
> three chips. This is due to the fact that the initial subtraction of
row
> medians results in so many zeros in the matrix that the column
medians
> are then zero. Since the expression value is the overall median plus
the
> column median for that chip, the expression value for that gene will
be
> identical for all chips.
> 
> If we change the median polish algorithm to subtract column medians
> first, we don't have this problem, and the expression values are not
> much different from what we get using the usual algorithm.
> 
> Now I realize that this is more of a philosophical problem rather
than
> a real problem, because it is unlikely that any of  the expression
> values we are talking about would be considered 'differentially
> expressed'. However, this does appear to me to be an unintended
> consequence of using the current implementation, and the fix is a
> trivial change in the code for RMA:
> 
> In the function median_polish, change
> 
> for (iter = 1; iter <= maxiter; iter++){
>     get_row_median(z,rdelta,nprobes,cols);
>     subtract_by_row(z,rdelta,nprobes,cols);
>     rmod(r,rdelta,nprobes);
>     delta = median(c,cols);
>     for (j = 0; j < cols; j++){
>       c[j] = c[j] - delta;
>     }
>     t = t + delta;
>     get_col_median(z,cdelta,nprobes,cols);
>     subtract_by_col(z,cdelta,nprobes,cols);
>     cmod(c,cdelta,cols);
>     delta = median(r,nprobes);
>     for (i =0; i < nprobes; i ++){
>       r[i] = r[i] - delta;
>     }
>     t = t+delta;
>     newsum = sum_abs(z,nprobes,cols);
>     if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
>       break;
>     oldsum = newsum;
> 
> 
> To
> 
> for (iter = 1; iter <= maxiter; iter++){
>     get_col_median(z,cdelta,nprobes,cols);
>     subtract_by_col(z,cdelta,nprobes,cols);
>     cmod(c,cdelta,cols);
>     delta = median(r,nprobes);
>     for (i =0; i < nprobes; i ++){
>       r[i] = r[i] - delta;
>     }
>     t = t + delta;
>     get_row_median(z,rdelta,nprobes,cols);
>     subtract_by_row(z,rdelta,nprobes,cols);
>     rmod(r,rdelta,nprobes);
>     delta = median(c,cols);
>     for (j = 0; j < cols; j++){
>       c[j] = c[j] - delta;
>     }
>     t = t+delta;
>     newsum = sum_abs(z,nprobes,cols);
>     if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
>       break;
>     oldsum = newsum;
> 
> 
> This appears to me to be a reasonable thing to do, but I am curious
> what others think.
> 
> Regards,
> 
> Jim
> 
> 
> 
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch 
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor 
-- 
Ben Bolstad <bolstad at stat.berkeley.edu>