[BioC] quality assessment and preprocessing for tiling array-based CGH data
Leon Yee
yee.leon at gmail.com
Wed Oct 22 16:32:34 CEST 2008
Sean Davis wrote:
> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
>> Dear all,
>>
>> Is there any well-established routine for quality assessment and
>> preprocessing of array CGH data, especially tiling array-based CGH data? I
>> found most of the quality assessment of array data are about expression
>> array, while few are related to array CGH data.
>> We are using agilent 244k CGH array of rat, and now we have the text
>> files produced by Feature Extraction, don't know whether they are of good
>> quality. Could anyone help provide some clues? Thanks in advance!
>>
>> After read.maimage(), we got the RGlist object, which contain several
>> components including R, G, Rb, Gb, and so on. The probes are of 3 types:
>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
>> sequence]; 1 means positive control, i guess, and the probe names are like
>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>> corresponding probe sequence]. The number of -1 is 1275, while the number
>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
>> values for quality assessment and normalization? How?
>> In addition, in the normal probes, we have 1000 probes repeating 3 times
>> in the array. How could we use these data for quality assessment and
>> normalization?
>
> You generally will not want to do any normalization besides a possible
> shift of the center. Any linear normalization that affects the slope
> of the M vs. A plot or nonlinear normalization will likely decrease
> signal. As for quality control, a good, general measure to track is
> the dlrs, a robust measure of the standard deviation.
>
>
> dlrs <-
> function(x) {
> nx <- length(x)
> if (nx<3) {
> stop("Vector length>2 needed for computation")
> }
> tmp <- embed(x,2)
> diffs <- tmp[,2]-tmp[,1]
> dlrs <- IQR(diffs)/(sqrt(2)*1.34)
> return(dlrs)
> }
>
> For agilent arrays, most of the dlrs should be around or under 0.2,
> generally. However, this might vary a bit based on lab-to-lab
> variation. In any case, if there is a significant outlier, that is
> suspect. The input to the above function is the log ratios for a
> single array arranged in chromosome and position order.
>
> Sean
>
Hi, Sean
Thanks for your advice. However, I have still several questions:
1. The input of dlrs is the log ratios, the log ration extracted
from the text file produced by Feature Extraction? or calculated from
RGlist --> MAlist ? I have searched the mailist and seen a post of you
mentioned the difference of log ration from Feature Extraction and the
default M value from read.maimages.
2. I can get the log ratios of all features including control type
of -1 and 1, but these features don't have chromosome positions, does
this mean I don't need all of them for quality assessment?
3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not
get a proper mapping on the chromosome, so I should remove these values
from the input of dlrs. Is it so?
4. How could I handle those 1000 probes repeating 3 times? They
will be mapped on the same chromosome position by three per group.
Regards,
Leon
More information about the Bioconductor
mailing list