[BioC] Issue with limma and normalization of Agilent data generated with a 20-bit scan

White, Peter Peter.White at nationwidechildrens.org
Tue Mar 16 13:48:17 CET 2010


Dear Gordon,

Thanks so much for your detailed response. I did try setting the span to  and it definitely improved the J-curve. Scroll all the way to the bottom of:

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/27771

I also tried setting the span to 0.01 and it looked even better (lower than that and crazy things started to happen):

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/27770

As far as I could see lowering the span had no effect on the elements less than an average log2 intensity of 16. I did try adding a weight function that elements with a median red or green intensity > 60,000 were weighted 5, but it made no discernable difference to the default normalization.

All the best,

Peter



> -----Original Message-----
> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Monday, March 15, 2010 6:54 PM
> To: White, Peter
> Cc: 'Bioconductor mailing list'
> Subject: RE: [BioC] Issue with limma and normalization of Agilent data
> generated with a 20-bit scan
> 
> Dear Peter,
> 
> The plots on the permalink website do not correspond to the R code in
> your
> email.  Apparently the plots are from marray package, produced by R
> code
> which you do not give.  I can't comment on the behaviour of someone
> else's
> package.
> 
> As I've already told you, normalizeWithinArrays() uses all the points,
> and
> having an A-value over 16 is not an issue.  However, the loess curve is
> designed to be quite stiff, and also to ignore outliers, and therefore
> the
> curve will not follow a highly localized J-curve at the right end of
> the
> range, which is what your MA-plot seems to show.  The loess curve is
> designed to follow the main trend, not local trends involving small
> proportions of the points.  If you believe that your platform has a
> different MvsA relationship for A>16 than for A<16, and this should be
> removed by the normalization curve, then you can do one of two things.
> One possibility is simply to make the curve more local by reducing the
> span paramater:
> 
>   normObj <- normalizeWithinArrays(bgObj, method="loess", span=0.1)
> 
> Choosing span small enough will certainly remove the J-curve.  However
> I
> don't recommend this approach, as it is not specific to A>16.  I
> recommend
> instead that you use the modifyWeights() function to give the spots
> with
> A>16 increased weights, so the loess curve will follow it more closely.
> You might find a combination of slightly reduced span and increased
> weights will work best.
> 
> BTW, I would recommend that you use spottypes to display control spots,
> and weights to downweight them in the loess normalization, rather than
> hacking NA values into your data.  It gives you more information and
> flexibility.
> 
> Best wishes
> Gordon
> 
> On Mon, 15 Mar 2010, White, Peter wrote:
> 
> > Dear Gordon,
> >
> > The plots are visible in the blog view on gmane.org:
> >
> >
> http://permalink.gmane.org/gmane.science.biology.informatics.conductor/
> 27731
> >
> > I thought you may be on to something with the weights but I tried it
> > with and without a flag function (also double checked the Agilent
> file
> > and the high intensity spots are not flagged). It really does look
> like
> > the loess is just not fitted beyond for elements with an A value >
> 16???
> > These 20-bit scans from Agilent are quite new and I suspect most
> folks
> > with just use the Agilent normalized data rather than starting with
> the
> > raw data, so maybe this just hasn't been observed before now?
> >
> > Thanks,
> >
> > Peter
> >
> > Below is the code I used:
> >
> >   library(limma)
> >    agilentFiles <- list.files(pattern="U")
> >    rawObj <- read.maimages(agilentFiles,
> >          columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal",
> >          R = "rMedianSignal", Rb = "rBGMedianSignal"),
> >          annotation= c("ProbeName", "SystematicName","ControlType"))
> > #Remove spike controls and remove background signals
> >    bgObj <- rawObj
> >    posControls <- grep(T,rawObj$genes$ControlType == 1)
> >    bgObj$G[posControls,] <- NA
> >    bgObj$R[posControls,] <- NA
> >    bgObj$Gb <- bgObj$Rb <- NULL
> > #Loess normalize
> >    normObj <- normalizeWithinArrays(bgObj, method="loess",
> weights=NULL)
> > #Plot MvA
> >    for (i in 1:ncol(normObj)) {
> >       figureName <- paste(i, " MvA Plots")
> >       mat <- matrix(c(3,1,2),nrow=3,ncol=1)
> >       layout(mat,heights=c(1,10,10))
> >       plotMA(rawObj, array=i, main = "Pre-Normalization MvA",
> >            ylim=c(-3.5,3.5), zero.weights=TRUE)
> >            abline(0,0)
> >       plotMA(normObj, array=i, main = "Normalized MvA",
> >            ylim=c(-3.5,3.5), zero.weights=TRUE)
> >            abline(0,0)
> >            layout(1)
> >       mtext(figureName, cex=1.25, line=3)
> >       savePlot(filename=figureName, type=c("png"), device=dev.cur())
> >    }
> >
> >> sessionInfo()
> > R version 2.10.1 (2009-12-14)
> > i386-pc-mingw32
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] grDevices datasets  splines   graphics  stats     tcltk     utils
> > [8] methods   base
> >
> > other attached packages:
> > [1] limma_3.2.2     svSocket_0.9-48 TinnR_1.0.3     R2HTML_1.59-1
> > [5] Hmisc_3.7-0     survival_2.35-9
> >
> > loaded via a namespace (and not attached):
> > [1] cluster_1.12.1 grid_2.10.1    lattice_0.18-3 svMisc_0.9-56
> tools_2.10.1
> >
> >> -----Original Message-----
> >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
> >> Sent: Saturday, March 13, 2010 6:39 PM
> >> To: White, Peter
> >> Cc: Bioconductor mailing list
> >> Subject: [BioC] Issue with limma and normalization of Agilent data
> >> generated with a 20-bit scan
> >>
> >> Dear Peter,
> >>
> >> You can't send attachments to the Bioconductor mailing list, so I
> have
> >> not
> >> seen your plots.  However I am not aware of any issue such as you
> >> describe.  The limma function normalizeWithinArrays includes all
> spots
> >> in
> >> the normalization, regardless of how large the A-value is.  You
> haven't
> >> shown us any code, or any problem we can reproduce, so we can't tell
> >> whether or not you're doing something wrong.  We don't know whether
> >> you're
> >> using probe weights, whether you've filtered control spots, etc etc.
> >>
> >> Best wishes
> >> Gordon
> >>
> >>> Date: Fri, 12 Mar 2010 10:21:41 -0500
> >>> From: "White, Peter" <Peter.White at nationwidechildrens.org>
> >>> To: "'bioconductor at stat.math.ethz.ch'"
> >>>     <bioconductor at stat.math.ethz.ch>
> >>> Subject: [BioC] Issue with limma and normalization of Agilent data
> >>>     generated with a 20-bit scan
> >>> Content-Type: text/plain
> >>>
> >>> I have noticed an issue with the limma normalizeWithinArrays
> function
> >>> (and also with marray and maNorm). When normalizing two color data
> >>> generated with an Agilent 20-bt scanner it fails to normalize the
> >> high
> >>> intensity data (i.e. any points with an A value > 16). In our
> dataset
> >> we
> >>> have in excess of 400 elements with red and green intensities
> ranging
> >>> from 65500 to 475100. When we loess normalize the data any points
> >> beyond
> >>> A=16 appear to be untouched by the normalization. If the attached
> >>> figures come through this should be clear - when using maNorm and
> >> maPlot
> >>> it will plot the loess line and you can see it stop at 16.
> >>>
> >>> Is it possible for loess normalization to be extended to this
> higher
> >>> intensity data? Or am I just doing something wrong?
> >>>
> >>> Thanks,
> >>>
> >>> Peter
> >>>
> >>>
> >>> Peter White, Ph.D.
> >>> Director, Biomedical Genomics
> Core<http://genomics.nchresearch.org/>
> >>> Research Assistant Professor of Pediatrics
> >>> The Research Institute at
> >>> Nationwide Children's Hospital and
> >>> The Ohio State University
> >>>
> >>> Mailing Address:
> >>>
> >>> The Research Institute at
> >>> Nationwide Children's Hospital
> >>> 700 Children's Drive, W510
> >>> Columbus, OH 43205
> >>>
> >>> Assistant (Jennifer Neelans): (614) 722-2915
> >>> Office: (614) 355-2671
> >>> Lab: (614) 355-5252
> >>> Fax: (614) 722-2818
> >>> Web: http://genomics.nchresearch.org/
> 
> ______________________________________________________________________
> The information in this email is confidential and intended solely for
> the addressee.
> You must not disclose, forward, print or use it without the permission
> of the sender.
> ______________________________________________________________________
----------------------------------------- Confidentiality Notice:
The following mail message, including any attachments, is for the
sole use of the intended recipient(s) and may contain confidential
and privileged information. The recipient is responsible to
maintain the confidentiality of this information and to use the
information only for authorized purposes. If you are not the
intended recipient (or authorized to receive information for the
intended recipient), you are hereby notified that any review, use,
disclosure, distribution, copying, printing, or action taken in
reliance on the contents of this e-mail is strictly prohibited. If
you have received this communication in error, please notify us
immediately by reply e-mail and destroy all copies of the original
message. Thank you.



More information about the Bioconductor mailing list