[BioC] quality assessment and preprocessing for tiling array-based CGH data
Sean Davis
sdavis2 at mail.nih.gov
Wed Oct 22 16:02:35 CEST 2008
On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
> Dear all,
>
> Is there any well-established routine for quality assessment and
> preprocessing of array CGH data, especially tiling array-based CGH data? I
> found most of the quality assessment of array data are about expression
> array, while few are related to array CGH data.
> We are using agilent 244k CGH array of rat, and now we have the text
> files produced by Feature Extraction, don't know whether they are of good
> quality. Could anyone help provide some clues? Thanks in advance!
>
> After read.maimage(), we got the RGlist object, which contain several
> components including R, G, Rb, Gb, and so on. The probes are of 3 types:
> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
> sequence]; 1 means positive control, i guess, and the probe names are like
> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
> corresponding probe sequence]. The number of -1 is 1275, while the number
> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
> values for quality assessment and normalization? How?
> In addition, in the normal probes, we have 1000 probes repeating 3 times
> in the array. How could we use these data for quality assessment and
> normalization?
You generally will not want to do any normalization besides a possible
shift of the center. Any linear normalization that affects the slope
of the M vs. A plot or nonlinear normalization will likely decrease
signal. As for quality control, a good, general measure to track is
the dlrs, a robust measure of the standard deviation.
dlrs <-
function(x) {
nx <- length(x)
if (nx<3) {
stop("Vector length>2 needed for computation")
}
tmp <- embed(x,2)
diffs <- tmp[,2]-tmp[,1]
dlrs <- IQR(diffs)/(sqrt(2)*1.34)
return(dlrs)
}
For agilent arrays, most of the dlrs should be around or under 0.2,
generally. However, this might vary a bit based on lab-to-lab
variation. In any case, if there is a significant outlier, that is
suspect. The input to the above function is the log ratios for a
single array arranged in chromosome and position order.
Sean
More information about the Bioconductor
mailing list