[BioC] RMA/median polish question

Thu Sep 25 20:09:19 MEST 2003

Two points here:

1. One of the "features" of the median polish algorithm is that it may
converge to different parameter estimates if you sweep rows or columns
first. By convention I have been following the same order as implemented
in the medpolish function (this ensures exact agreement between
expresso(), rma(), justRMA() implementations of RMA)

2. I am not really sure whether 3 arrays is really sufficient for
estimating both probe and chip effect parameters in the RMA context.

Ben

On Thu, 2003-09-25 at 11:31, James MacDonald wrote:
> Hi All,
> 
> I have a question about the implementation of medpolish in RMA. The
> algorithm involves repeated subtractions of row medians and column
> medians from a matrix of probe intensity values. 
> 
> The problem I have noticed is that if you have an odd number of chips
> (especially if you only have three chips), you will end up with an
> inordinantly high percentage of expression values that are identical in
> all three chips. We are talking about 25-30% of the genes when using
> three chips. This is due to the fact that the initial subtraction of row
> medians results in so many zeros in the matrix that the column medians
> are then zero. Since the expression value is the overall median plus the
> column median for that chip, the expression value for that gene will be
> identical for all chips.
> 
> If we change the median polish algorithm to subtract column medians
> first, we don't have this problem, and the expression values are not
> much different from what we get using the usual algorithm.
> 
> Now I realize that this is more of a philosophical problem rather than
> a real problem, because it is unlikely that any of  the expression
> values we are talking about would be considered 'differentially
> expressed'. However, this does appear to me to be an unintended
> consequence of using the current implementation, and the fix is a
> trivial change in the code for RMA:
> 
> In the function median_polish, change
> 
> for (iter = 1; iter <= maxiter; iter++){
>     get_row_median(z,rdelta,nprobes,cols);
>     subtract_by_row(z,rdelta,nprobes,cols);
>     rmod(r,rdelta,nprobes);
>     delta = median(c,cols);
>     for (j = 0; j < cols; j++){
>       c[j] = c[j] - delta;
>     }
>     t = t + delta;
>     get_col_median(z,cdelta,nprobes,cols);
>     subtract_by_col(z,cdelta,nprobes,cols);
>     cmod(c,cdelta,cols);
>     delta = median(r,nprobes);
>     for (i =0; i < nprobes; i ++){
>       r[i] = r[i] - delta;
>     }
>     t = t+delta;
>     newsum = sum_abs(z,nprobes,cols);
>     if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
>       break;
>     oldsum = newsum;
> 
> 
> To
> 
> for (iter = 1; iter <= maxiter; iter++){
>     get_col_median(z,cdelta,nprobes,cols);
>     subtract_by_col(z,cdelta,nprobes,cols);
>     cmod(c,cdelta,cols);
>     delta = median(r,nprobes);
>     for (i =0; i < nprobes; i ++){
>       r[i] = r[i] - delta;
>     }
>     t = t + delta;
>     get_row_median(z,rdelta,nprobes,cols);
>     subtract_by_row(z,rdelta,nprobes,cols);
>     rmod(r,rdelta,nprobes);
>     delta = median(c,cols);
>     for (j = 0; j < cols; j++){
>       c[j] = c[j] - delta;
>     }
>     t = t+delta;
>     newsum = sum_abs(z,nprobes,cols);
>     if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
>       break;
>     oldsum = newsum;
> 
> 
> This appears to me to be a reasonable thing to do, but I am curious
> what others think.
> 
> Regards,
> 
> Jim
> 
> 
> 
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
-- 
Ben Bolstad <bolstad at stat.berkeley.edu>