[BioC] rcModelMedianPolish of the package preprocessCore/ imitate median polish of RMA

Wed Mar 26 17:04:25 CET 2014

Hi Stefanie,

Please don't take conversations off-list (e.g., use Reply-all when 
responding).

On 3/26/2014 11:47 AM, Stefanie Busch wrote:
> Hi Jim,
> Thank you for the fast answer!
> > I'm not sure median polish is the way to go here. Note that the Affy
> chips all have multiple short probes that are all intended in aggregate
> to measure a single transcript. In other words, for a given transcript,
> there are maybe 4-16 25-mers that are all measuring the same thing, and
> we need some way to summarize the data from all those probes into a
> single statistic that best represents the underlying abundance of the
> transcript. Median polish is a simple way to summarize these data while
> accounting for the fact that there is quite a bit of technical
> variability between the probes that we want to account for (and then
> ignore).
> >On the other hand, the Illumina and Agilent interrogate each transcript
> with a single, longer probe. It may be that there are multiple such
> probes on a given array, but the probes themselves are all the same. So
> there might be technical variability between the probes because of their
> location on the array, but that isn't analogous to the probe-specific
> binding that median polish is intended to account for.
> Does this mean that the summarization via median polish is not 
> nessecary for Illumina or Agilent? I want to compare the results of 
> similar experiments but different microarrays. I think it is necessary 
> for a better comparision to preprocess this data in the same manner.

That's exactly what I mean. You have to understand that different 
platforms require different analysis methods, and you can't just apply 
one method to them all.

If you want to compare results from different platforms, you will likely 
have to process the data separately, and then do a meta-analysis.

Looking at the install page for software and searching for 'meta', I 
come up with the following possibilities:
categoryCompare
GeneMeta
metaArray
RankProd

And a google search will likely bring up even more.

Best,

Jim

> >
> > But I don't know what I should do exacty with the median polish.
> >
> > I use an example matrix which look like this:
> >> matrix
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 18 11 8 21 4
> > [2,] 13 7 5 16 7
> > [3,] 15 6 7 16 6
> > [4,] 19 15 12 18 5
> >What do the rows and columns represent here? Are these measures from the
> same transcript, or different transcripts?
> In my example the rows are different genes and the columns represent 
> different probes/chips (e.g. 1,2 and 3: expression values of a mouse 
> with standard diet and 4,5 for mice with standard diet + resveratrol).
> For Example
>             SD1   SD2    SD3   SDR1   SDR2
> Gen1
> Gen2
> Gen3
> Gen4
> >So in this case your column effect estimates are
> 16.25 9.75 8.00 18.00 5.00
> and your row effect estimates are
> 1.25 -2.75 -1.25 2.75
> So what does this mean? In result I need a new matrix with the 
> expression values which are summarized by median polish?
> Kind regards,
> Stefanie
> *Gesendet:* Mittwoch, 26. März 2014 um 15:58 Uhr
> *Von:* "James W. MacDonald" <jmacdon at uw.edu>
> *An:* "Stefanie Busch [guest]" <guest at bioconductor.org>
> *Cc:* bioconductor at r-project.org, tiffi_88 at web.de, "preprocessCore 
> Maintainer" <bmb at bmbolstad.com>
> *Betreff:* Re: [BioC] rcModelMedianPolish of the package 
> preprocessCore/ imitate median polish of RMA
> Hi Stefanie,
>
> On 3/26/2014 9:35 AM, Stefanie Busch [guest] wrote:
> > Hello,
> >
> > I want to imitate RMA on illumina and agilent chips. There is no 
> problem with background correction, quantile normalization and log2 
> tranformation.
>
> I'm not sure median polish is the way to go here. Note that the Affy
> chips all have multiple short probes that are all intended in aggregate
> to measure a single transcript. In other words, for a given transcript,
> there are maybe 4-16 25-mers that are all measuring the same thing, and
> we need some way to summarize the data from all those probes into a
> single statistic that best represents the underlying abundance of the
> transcript. Median polish is a simple way to summarize these data while
> accounting for the fact that there is quite a bit of technical
> variability between the probes that we want to account for (and then
> ignore).
>
> On the other hand, the Illumina and Agilent interrogate each transcript
> with a single, longer probe. It may be that there are multiple such
> probes on a given array, but the probes themselves are all the same. So
> there might be technical variability between the probes because of their
> location on the array, but that isn't analogous to the probe-specific
> binding that median polish is intended to account for.
>
>
> >
> > But I don't know what I should do exacty with the median polish.
> >
> > I use an example matrix which look like this:
> >> matrix
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 18 11 8 21 4
> > [2,] 13 7 5 16 7
> > [3,] 15 6 7 16 6
> > [4,] 19 15 12 18 5
>
> What do the rows and columns represent here? Are these measures from the
> same transcript, or different transcripts?
>
> >
> > Than I do the median polish
> >> rcModelMedianPolish(matrix)
> > $Estimates
> > [1] 16.25 9.75 8.00 18.00 5.00 1.25 -2.75 -1.25 2.75
> >
> > $Weights
> > NULL
> >
> > $Residuals
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 0.5 0.0 -1.25 1.75 -2.25
> > [2,] -0.5 0.0 -0.25 0.75 4.75
> > [3,] 0.0 -2.5 0.25 -0.75 2.25
> > [4,] 0.0 2.5 1.25 -2.75 -2.75
> >
> > $StdErrors
> > NULL
> >
> > so what should I do next? What are my expression values. I have read 
> that I must substract the Residuals from my original data
>
> I don't know where you might have read that, but it isn't correct. And
> you could hypothetically have answered this yourself by simply reading
> the help page for the function you are using:
>
> Value:
>
> A list with following items:
>
> Estimates: The parameter estimates. Stored in column effect then row
> effect order
>
> So in this case your column effect estimates are
>
> 16.25 9.75 8.00 18.00 5.00
>
> and your row effect estimates are
>
> 1.25 -2.75 -1.25 2.75
>
> But note that this assumes that you want the column effects, and are
> considering the row effects as a nuisance parameter (e.g., the column
> effects are the overall median + column effects, whereas the row effects
> are simply the row effects):
>
> > mat
> [,1] [,2] [,3] [,4] [,5]
> [1,] 18 11 8 21 4
> [2,] 13 7 5 16 7
> [3,] 15 6 7 16 6
> [4,] 19 15 12 18 5
> > z <- medpolish(mat)
> > z
>
> Median Polish Results (Dataset: "mat")
>
> Overall: 9.75
>
> Row Effects:
> [1] 1.25 -2.75 -1.25 2.75
>
> Column Effects:
> [1] 6.50 0.00 -1.75 8.25 -4.75
>
> Residuals:
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.5 0.0 -1.25 1.75 -2.25
> [2,] -0.5 0.0 -0.25 0.75 4.75
> [3,] 0.0 -2.5 0.25 -0.75 2.25
> [4,] 0.0 2.5 1.25 -2.75 -2.75
>
> > z$over+z$col
> [1] 16.25 9.75 8.00 18.00 5.00
>
> Which is the same result you get from rcModelMedianPolish().
>
> Best,
>
> Jim
>
>
>
> >
> >> MedianPolish<-rcModelMedianPolish(matrix)
> >> matrix-MedianPolish$Residuals
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 17.5 11.0 9.25 19.25 6.25
> > [2,] 13.5 7.0 5.25 15.25 2.25
> > [3,] 15.0 8.5 6.75 16.75 3.75
> > [4,] 19.0 12.5 10.75 20.75 7.75
> >
> >
> > But somewhere else I've read that the first 5 values of the 
> Estimates are my expression values. But in this case I have for 
> example 4 different genes. So this sounds not correct.
> >
> > So what is the right procedure?
> >
> > Best regards
> >
> > Stefanie
> >
> > -- output of sessionInfo():
> >
> > matrix
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 18 11 8 21 4
> > [2,] 13 7 5 16 7
> > [3,] 15 6 7 16 6
> > [4,] 19 15 12 18 5
> >> rcModelMedianPolish(matrix)
> > $Estimates
> > [1] 16.25 9.75 8.00 18.00 5.00 1.25 -2.75 -1.25 2.75
> >
> > $Weights
> > NULL
> >
> > $Residuals
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 0.5 0.0 -1.25 1.75 -2.25
> > [2,] -0.5 0.0 -0.25 0.75 4.75
> > [3,] 0.0 -2.5 0.25 -0.75 2.25
> > [4,] 0.0 2.5 1.25 -2.75 -2.75
> >
> > $StdErrors
> > NULL
> >
> >> MedianPolish<-rcModelMedianPolish(matrix)
> >> matrix-MedianPolish$Residuals
> > [,1] [,2] [,3] [,4] [,5]
> > [1,] 17.5 11.0 9.25 19.25 6.25
> > [2,] 13.5 7.0 5.25 15.25 2.25
> > [3,] 15.0 8.5 6.75 16.75 3.75
> > [4,] 19.0 12.5 10.75 20.75 7.75
> >
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099