[BioC] single-channel GenePix data

Wed Jun 21 23:38:37 CEST 2006

Hi.

if you can assume that your spot intensities "should" be roughly the
same for all your arrays, the I suggest you to:

1) Check the empirical distributions of the log (base 2)-intensities
for all your arrays.  Do they differ a lot at higher intensities?
That indicates a difference in scale between arrays (different scanner
settings, labelling or hybridization efficiency, ...).  Don't worry
too much about the lower intensities for now, because even small
differences will blow up there due to the log-scale.

2) Do some MA plots for random array pairs.  Do you see a curvature at
lower intensities?  That indicates that you have a background/offset
in your data.  Do you see a shift away from M=0 at high intensities?
That is because of different scales.  Note that you might see a
curvature even when the offsets are the same in all array, because of
different scales.  You might want plot all possible pairs in the same
MA plot.  If it looks like the MA clouds converge to the same "point"
at the lower-end of the intensity range, that indicates a common
offset in all arrays.

3) Plot pairs of raw (=non-log) spot signals from two random arrays in
a XY plot.  Zoom in at the lower intensities too, i.e. 0-500 or so.
You might want to plot all possible pairs in different colors in the
same scatter plot.  Add a diagonal line to, i.e. abline(a=0,b=1).  To
the different data clouds (rays) converge toward the origin (0,0) or
not?  If toward (0,0) you have little background/offset in your data.
If toward a different point, you have offset.  If toward a point along
the diagonal you might have an offset in your scanner.

4) If you have a common offset in all arrays, you might have
identified a scanner offset [1].  This you can calibrate for if you
scan your arrays at multiple PMT-levels.  See my reply to "[BioC]
multi PMT scan combination" on June 14, 2006
[http://article.gmane.org/gmane.science.biology.informatics.conductor/8998].
 If you already scanned your arrays, a second best option is to scan
one array multiple times and estimate the offset in the scanner.
Subtract this offset from the spot signals in all your arrays.  This
should work, because we found that the scanner offset was very stable
across arrays [1].

5) Look at (1)-(3) again.  Even when the scanner offset is as low as
10-15 units (on 0-65535) you will see a difference at the lower
intensities.  At this point it might be enough to just rescale the
spot signals to the same average intensity.  Verify by (1)-(3).

6) If there is still offset effects remaining in (1)-(3) such may have
been added somewhere in the process up to (but excluding) the
scanning.  To correct for such background we have to turn to less
reliable assumptions/modelling.  The simplest model is to assume a
background plus a scale difference (but no higher order terms).
Mathematically this is modelled by an affine function f(x)=a+bx+noise.
 Thus, try affine normalization [2] of all your arrays at once.  Since
such a model is not fully identifiable (without spike-ins) there is
one parameter you have to tune by hand/visually.  In practice, the
parameter specifies how much background you allow to subtract or
alternatively how many non-positive signals you allow.

7) Look at (1)-(3) again.  It should look better now.  Note however
that at lower log-intensities the non-log signals are very weak and
small shifts may look huge on the log scale.  Don't be afraid of
those.

8) If it still not look good, we have to turn to other assumption
beyond the offset and scale differences.  At this point I would try
out the quantile normalization methods (in addition).  If possible,
try one that allows you to set the smoothness of the estimated
quantiles.  This will roughly correspond to estimating f(x)=a + bx +
cx^2 + dx^3 + ... with more and more coefficients.

All of the above is explain more or less explicitly in [1] and [2].
Also, I prefer to work with foreground signals only and not do
background subtraction based on image-analysis background estimates.

References:
[1] H. Bengtsson, J. Vallon-Christersson and G. Jönsson, Calibration
and assessment of channel-specific biases in microarray data with
extended dynamical range, BMC Bioinformatics, 5:177, 2004.
[2] Bengtsson, H. Jönsson, G. and Christersson, J.V. Calibration and
assessment of channel-specific biases in microarray data with extended
dynamical range BMCBioinfo, 2004, 5.

Talks:
http://www.maths.lth.se/bioinformatics/talks/

Software:
The aroma.* packages at http://www.braju.com/R/.

Hope this give you some ideas how to proceed.

Henrik

On 6/20/06, Svetlana Bulashevska <s.bulashevska at dkfz-heidelberg.de> wrote:
> Dear colleagues,
> I have single-channel GenePix data, I have managed to read it in with
> the package limma,
> which is designed for two-color data.
> Could you please give me a tip what can I do further to normalize the data
> and to find differentially expressed genes?
> Thank you very much for the help,
> Svetlana Bulashevska.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>