[BioC] Agilent CGH data

Tue Sep 25 21:55:53 CEST 2007

Quoting Sean Davis <sdavis2 at mail.nih.gov> on Tue 25 Sep 2007 17:50:10 BST:

> Sean Davis wrote:
> > jhs1jjm at leeds.ac.uk wrote:
> >> R 2.5.0 on openSUSE 10.2 x86_64.
> >>
> >> Hi,
> >>
> >> I'm using the arrayQuality package to analyse 3 44k Agilent CGH arrays
> with the
> >> aim of identifying regions of gain/loss.
> >>
> >> With the HTML report generated using the agQuality function i'm not
> getting the
> >> coloured loess curve on the MA plot for raw M. Additionally i'm only
> getting 1
> >> value for the dot plot of controls normalized M values (-)3xLv1 (n=330)
> and
> >> likewise for the control A values. Alternatively when I run the
> maQualityPlots
> >> function on my mraw object created in marray  I get these but don't get
> the
> >> comparative box plot.
> >>
> >> Firstly is this important as I'm unsure of how useful the comparative
> boxplots
> >> are as some values are NA? Secondly is this an appropriate tool to use and
> are
> >> there any others that may be of more use both for quality control and for
> >> analysis further down the line? Thankyou kindly for any input.
> >
> > Hi, John.  Are these CGH arrays or expression arrays?  The two probably
> > need some different treatment.  You imply you are using CGH arrays in
> > looking for regions of gain/loss.  Is this the case?
>
> And, then, of course, there is the subject, "Agilent CGH data"--SORRY!
>
> In this case, you do not want to rely on loess or other non-linear
> normalization methods.  Also, the MA plots for the best arrays DO show a
> positive slope--this is totally expected and sought after.  In other
> words, with higher M-values, we expect higher A-values.
>
> We have found that a pretty good measure of quality of CGH arrays is the
>  dlrs:
>
> dlrs <-
>   function(x) {
>     nx <- length(x)
>     if (nx<3) {
>       stop("Vector length>2 needed for computation")
>     }
>     tmp <- embed(x,2)
>     diffs <- tmp[,2]-tmp[,1]
>     dlrs <- IQR(diffs)/(sqrt(2)*1.34)
>     return(dlrs)
>   }
>
> Run this on the Log ratios (ordered by chromosome and position).  Good
> values are less than 0.2 or so, but even some slightly higher can be used.
>
> As for analysis, you may want to look into the snapCGH package, as it
> allows multiple analyses to be run with the same data structures.
>
> Sean
>
Hi Sean,

I'd been using the loess method with the marray package, I was going by a paper
i'd read regarding Agilent feature extraction software vs other pre processing
methods (Zahurak et al 2007 I think). The MA plot for the raw intensities does
show a positive slope, in this case which normalization method should I use?

I ran dlrs and got the following:

> qual <- dlrs(CNA.object[,3])
> qual
[1] 0.5586258
> qual2 <- dlrs(CNA.object[,4])
> qual2
[1] 0.5778217
> qual3 <- dlrs(CNA.object[,5])
> qual3
[1] 0.5625572

As you can see I used the CNA.object for which you kindly provided a function to
 separate the ch from location and order them. Having done that I realized that
the log to ratios I'd used are from the mnorm object (mnorm at maM), I went back
and created a second CNA.object as follows:

> CNA.object2 <-
CNA(logratio,agilentInfo$chromosome,agilentInfo$location,data.type="logratio")

and then got the following:

> qual <- dlrs(CNA.object2[,3])
> qual
[1] 0.5802947
> qual2 <- dlrs(CNA.object2[,4])
> qual2
[1] 0.6258332
> qual3 <- dlrs(CNA.object2[,5])
> qual3
[1] 0.5925305

Does this mean the quality of the arrays is poor? How would I go about
referencing this (for my dissertation)?

Thanks again

John