# [BioC] expresso: performing RMA on NON-Affy data?

James W. MacDonald jmacdon at med.umich.edu
Mon Apr 27 15:36:39 CEST 2009

```Hi Robert,

Robert Castelo wrote:
> hi Jim,
>
> the reason is that i'm teaching a course on microarray data analysis to
> students who are not familiar with statistics beyond the basic
> descriptive ones. in front of such audience it has been helpful for me
> to simulate some data and apply to it the corresponding analysis
> technique when illustrating how the technique works (after that, then we
> use it on real data). by simulating data, people sees explicitly the
> assumptions made behind the mechanism generating these data so that a
> fraction of them (which makes me already happy) gets to understand why a
> particular method works better than other one.

That seems a bit backwards to me - there are no assumptions behind the
mechanism generating these data. They just are what they are. The only
assumptions being made would be that the data are of a certain
distribution (or convolution of one or more distributions) when you were
simulating.

>
> in the particular case below i'd like to make the point on why the
> median polish summarization method works better than the taking the
> arithmetic mean, illustrating somehow what you wisely said about the
> mean being not robust to outliers but being uniformly more powerful for
> Gaussian data, etc etc.

Why not just show some examples of what real data look like? The
Dilution series contains some of the cleanest data around (as it was a
spike-in data set run as carefully as possible), and you can easily see
what I am talking about just by randomly picking a probeset:

library(affy)
library(affydata)
library(lattice)
data(Dilution)
a <- pm(Dilution, "1007_s_at")
boxplot(a)
points(1:4, colMeans(a), pch = 20, col="red", cex=1.2)
nam <- factor(rep(colnames(a), each = dim(a)[1]))
probes <- rep(1:dim(a)[1], 4)
dim(a) <- NULL
b <- data.frame(Values = a, Chip = nam, Probes = factor(probes))
barchart(Values~Probes |Chip , data=b)

>
> i know i can download lots of real data, but i don't know how could i
> demonstrate that a summarization method is better than other one with
> real data. using some QC technique (MA plots..) ?? i'll appreciate any
>
> thanks!
> robert.
>

--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

```