[BioC] normalisation assumptions (violation of)
Henrik Bengtsson
hb at stat.berkeley.edu
Tue Aug 8 19:06:46 CEST 2006
On 8/8/06, J.delasHeras at ed.ac.uk <J.delasHeras at ed.ac.uk> wrote:
> Quoting Henrik Bengtsson <hb at maths.lth.se>:
>
> > You haven't told us your platform. What type of scanner do you use?
>
> GenePix 4200AL.
I have no feedback on this specific model, but I'm keen to hear
about your findings.
/Henrik
>
> >> overcome this, so that negative values are turned into an arbitrary
> >> 0.5... and this totally flattened the MA plot, and nothing was
> >
> > Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005?
> > You might want to look into Kooperberg's background correction
> > methods, or the ones in limma.
>
> actually, I tried other numbers too, just to check that they did not
> have a drastic effect on the final results. I just wanted a positive
> number (actually >1 better, so that I can take logs directly) that is
> low enough so that I get a high M value when I divide the signal of teh
> other channel by it. M values of genes that have no detectable signal
> on one channel are meaningless, in that they don't represent any kind
> of fold enrichment... but they're useful to help me pick those genes.
>
>
> > You haven't told us your platform. What scanner do you have? You
> > might have an offset in your scanner (quite commonly added to avoid
> > that analogue negative signals are truncated to zero), e.g. Axon and
> > Agilent introduce about 20-25 units (which is significant). With a
> > simple scan protocol it is easy to check if your scanner introduce
> > offset. The method is described in
> >
> > H. Bengtsson, G. Jönsson and J. Vallon-Christersson, Calibration and
> > assessment of channel-specific biases in microarray data with extended
> > dynamical range, BMC Bioinformatics, 2004, 5:177.
> >
> > and the estimatation and calibration methods are in aroma.light. The
> > scanner offset is a global constant which means that you only fit a
> > single parameter per channel. That is, subtracting this "background"
> > from the foreground signals does not introduce as much noise as if you
> > would subtract the image-analysis estimated backgrounds unique to each
> > spot. This will leave you with less (probably no) non-positive
> > signals. It might also be enough to remove the curvature seen in your
> > raw MA plots. If so, your remaining problem will be how to estimate
> > the overall relative scale factor between the two channels, which is
> > only one parameter; it should be easier than using non-parametric
> > curve-fit methods.
>
> I would like to try your package aroma. I've been meaning to for a
> while. I like your reasoning. But unfortunately my "exploring" time is
> limited. You probably think that it will be a good investment of time
> to dedicate some time now to explore these issues more in depth... and
> I would agree... but unfortunately I am not able. It's not entirely my
> call...
>
> The problem I had with negative signals is enhanced in this particular
> experiment because I happened to have a few slides with abnormally high
> background, mainly on the Cy3 channel. The high background was due to a
> problem in the preparation of teh samples. Usually I get pretty clean
> slides. I'm working on repeating the "bad" slides to help solve this.
>
> > When you understand the bits and pieces of what's going on there you
> > will also be much more careful when you pick your normalization
> > method. If would say that curve-fit (loess, lowess, spline, ...)
> > normalization is often overkill and corrects for a symptome rather
> > than fixing the underlying problem. Quantile normalization can be
> > interpreted as a non-parametric method that corrects for affine
> > transformations, but it has a problem at the lower and higher
> > intensities. Variance stabilization methods (Rocke & Durbin, W Huber)
> > have an explicit affine component in there models so they are much
> > more suited to this type of transform. Plain affine normalization
> > (aroma.light) corrects for affine transformation without controlling
> > for variance (on purpose). The estimatation methods also differ
> > between the latter two approaches.
> >
> > I hope this is a good start.
>
> As ever, your replies are very useful. I just wished I had a little
> help so that I could spend more time looking at these details in a lot
> more depth. But I will do what I can, and the replies received so far
> are all very useful for me.
>
> Thanks!
>
> Jose
>
>
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
>
>
More information about the Bioconductor
mailing list