[BioC] Estimates of RMA parameters
Antille,Nicolas,LAUSANNE,NRC-BAS
nicolas.antille at rdls.nestle.com
Thu Aug 7 11:55:19 MEST 2003
Hi!
During background subtraction (with RMA), it is assumed that
the the observed intensities (O) are the sum of a signal(S)
and a noise(N) :
O = S + N
with S follows an exponential ditribution (parameter = alpha) and
N follows a normal distribution (parameters = mu and sigma).
I've tried to assess the quality of these parameters estimates by
simulation. For that, I've chosen the values of the three parameters
and I've generated a large data set (N values):
dataset <- rexp(N,alpha) + rnorm(N,mu,sigma)
After that, i've tried to estimate the values of the parameters like
in RMA:
bg.parameters(dataset, 2^12)
and I've been surprised to discover that the estimates weren't good at all
For example, with alpha=0.0005, mu=500, lambda=75 and N=50000, I've obtained
the following estimates:
alpha = 0.0029
mu = 776.64
sigma = 251.14
This encouraged me to find another way to estimate the parameters. I've
found
it, but I need your opinion:
First, I've computed the theoretical distribution of O (convolution of S and
N).
I've found the following expression:
f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu
- alpha*sigma^2)/(sigma))
where phi is the cumulative distribution function of the normal
distribution.
Then I've computed the corresponding log-likelihood function. The
computation of maximum
log-likelihood is possible but difficult. That's why I propose to solve it
in a different way.
We have: E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.)
mean(intensities)
and Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 =
(appr.) var(intensities)
where we suppose that S and N are independant and where intensities is the
vector of intensities (perfect match).
Then I suggest to make the following substitutions in the log-likelihood
function:
mu ----> mean(intensities) - 1/alpha
sigma^2 ----> var(intensities) - 1/alpha^2
At this point, we have to maximize the log-likelihood which contains just
one unknown parameter : alpha
For the moment, I suggest to find the maximum empirically (compute the
function for a large range of alpha)
In the example, I've computed the function for 5000 values of alpha randomly
selected between 0.000005 and 0.05.
I've obtained the following estimates:
alpha = 0.000498
mu = 501.09
sigma = 113.84
It seems much better than the previous estimates.
I'm impatiently waiting for your comments and criticals.
Thank you in advance!
Nicolas
More information about the Bioconductor
mailing list