[BioC] RMA/median polish question
Ben Bolstad
bolstad at stat.berkeley.edu
Thu Sep 25 20:09:19 MEST 2003
Two points here:
1. One of the "features" of the median polish algorithm is that it may
converge to different parameter estimates if you sweep rows or columns
first. By convention I have been following the same order as implemented
in the medpolish function (this ensures exact agreement between
expresso(), rma(), justRMA() implementations of RMA)
2. I am not really sure whether 3 arrays is really sufficient for
estimating both probe and chip effect parameters in the RMA context.
Ben
On Thu, 2003-09-25 at 11:31, James MacDonald wrote:
> Hi All,
>
> I have a question about the implementation of medpolish in RMA. The
> algorithm involves repeated subtractions of row medians and column
> medians from a matrix of probe intensity values.
>
> The problem I have noticed is that if you have an odd number of chips
> (especially if you only have three chips), you will end up with an
> inordinantly high percentage of expression values that are identical in
> all three chips. We are talking about 25-30% of the genes when using
> three chips. This is due to the fact that the initial subtraction of row
> medians results in so many zeros in the matrix that the column medians
> are then zero. Since the expression value is the overall median plus the
> column median for that chip, the expression value for that gene will be
> identical for all chips.
>
> If we change the median polish algorithm to subtract column medians
> first, we don't have this problem, and the expression values are not
> much different from what we get using the usual algorithm.
>
> Now I realize that this is more of a philosophical problem rather than
> a real problem, because it is unlikely that any of the expression
> values we are talking about would be considered 'differentially
> expressed'. However, this does appear to me to be an unintended
> consequence of using the current implementation, and the fix is a
> trivial change in the code for RMA:
>
> In the function median_polish, change
>
> for (iter = 1; iter <= maxiter; iter++){
> get_row_median(z,rdelta,nprobes,cols);
> subtract_by_row(z,rdelta,nprobes,cols);
> rmod(r,rdelta,nprobes);
> delta = median(c,cols);
> for (j = 0; j < cols; j++){
> c[j] = c[j] - delta;
> }
> t = t + delta;
> get_col_median(z,cdelta,nprobes,cols);
> subtract_by_col(z,cdelta,nprobes,cols);
> cmod(c,cdelta,cols);
> delta = median(r,nprobes);
> for (i =0; i < nprobes; i ++){
> r[i] = r[i] - delta;
> }
> t = t+delta;
> newsum = sum_abs(z,nprobes,cols);
> if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
> break;
> oldsum = newsum;
>
>
> To
>
> for (iter = 1; iter <= maxiter; iter++){
> get_col_median(z,cdelta,nprobes,cols);
> subtract_by_col(z,cdelta,nprobes,cols);
> cmod(c,cdelta,cols);
> delta = median(r,nprobes);
> for (i =0; i < nprobes; i ++){
> r[i] = r[i] - delta;
> }
> t = t + delta;
> get_row_median(z,rdelta,nprobes,cols);
> subtract_by_row(z,rdelta,nprobes,cols);
> rmod(r,rdelta,nprobes);
> delta = median(c,cols);
> for (j = 0; j < cols; j++){
> c[j] = c[j] - delta;
> }
> t = t+delta;
> newsum = sum_abs(z,nprobes,cols);
> if (newsum == 0.0 || fabs(1.0 - oldsum/newsum) < eps)
> break;
> oldsum = newsum;
>
>
> This appears to me to be a reasonable thing to do, but I am curious
> what others think.
>
> Regards,
>
> Jim
>
>
>
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
--
Ben Bolstad <bolstad at stat.berkeley.edu>
More information about the Bioconductor
mailing list