[BioC] Issue with limma and normalization of Agilent data generated with a 20-bit scan
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Mon Mar 15 23:30:05 CET 2010
I think what Wolfgang is saying is that the data is so affected by technical bias at the tail that even if you could get loess normalisation to get that tail straight, you might not want to believe anything that comes from there as the data is unreliable.
I have no idea why the ready built functions don't touch your tail, but you loess normalisation isn't *that* much of a complicated procedure - you should be able to fit a model using the loess() function and do the normalisation yourself.
________________________________________
From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber [whuber at embl.de]
Sent: 15 March 2010 22:03
To: White, Peter
Cc: 'Gordon K Smyth'; 'Bioconductor mailing list'
Subject: Re: [BioC] Issue with limma and normalization of Agilent data generated with a 20-bit scan
Dear Peter
what is the "saturation point"?
Non-linear response / saturation may occur even well below the nominal maximal value (2^20-1) of the detector, and perhaps this need not even be related to the detector, but rather to other steps in the process. How else do you explain the shape of the data before normalisation? (Try also looking at the data in the normal scatterplot.)
Best wishes
Wolfgang
Il giorno Mar 15, 2010, alle ore 10:47 PM, White, Peter ha scritto:
Hi Wolfgang,
So with the new scanner from Agilent this data is not saturated. The scanner went from 16-bit (0-65,000) to 20-bit (0-1,048,576). All of these values are well below the new saturation point, yet they are not being normalized.
Thanks,
Peter
> -----Original Message-----
> From: Wolfgang Huber [mailto:whuber at embl.de]
> Sent: Monday, March 15, 2010 5:25 PM
> To: White, Peter
> Cc: 'Gordon K Smyth'; 'Bioconductor mailing list'
> Subject: Re: [BioC] Issue with limma and normalization of Agilent data
> generated with a 20-bit scan
>
>
> Dear Peter
>
> have you tried with different (i.e. smaller) values of the "span"
> parameter for the loess fit?
>
> The data seem badly saturated... I'd prefer avoiding the kind of
> saturation such as seen in the data you posted by better settings of
> the
> scanner, rather than doing post hoc loess normalisation.
>
> Best wishes
> Wolfgang
>
>
> White, Peter scripsit 15/03/10 15:53:
>> Dear Gordon,
>>
>> The plots are visible in the blog view on gmane.org:
>>
>>
> http://permalink.gmane.org/gmane.science.biology.informatics.conductor/
> 27731
>>
>> I thought you may be on to something with the weights but I tried it
> with and without a flag function (also double checked the Agilent file
> and the high intensity spots are not flagged). It really does look like
> the loess is just not fitted beyond for elements with an A value >
> 16??? These 20-bit scans from Agilent are quite new and I suspect most
> folks with just use the Agilent normalized data rather than starting
> with the raw data, so maybe this just hasn't been observed before now?
>>
>> Thanks,
>>
>> Peter
>>
>> Below is the code I used:
>>
>> library(limma)
>> agilentFiles <- list.files(pattern="U")
>> rawObj <- read.maimages(agilentFiles,
>> columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal",
>> R = "rMedianSignal", Rb = "rBGMedianSignal"),
>> annotation= c("ProbeName", "SystematicName","ControlType"))
>> #Remove spike controls and remove background signals
>> bgObj <- rawObj
>> posControls <- grep(T,rawObj$genes$ControlType == 1)
>> bgObj$G[posControls,] <- NA
>> bgObj$R[posControls,] <- NA
>> bgObj$Gb <- bgObj$Rb <- NULL
>> #Loess normalize
>> normObj <- normalizeWithinArrays(bgObj, method="loess",
> weights=NULL)
>> #Plot MvA
>> for (i in 1:ncol(normObj)) {
>> figureName <- paste(i, " MvA Plots")
>> mat <- matrix(c(3,1,2),nrow=3,ncol=1)
>> layout(mat,heights=c(1,10,10))
>> plotMA(rawObj, array=i, main = "Pre-Normalization MvA",
>> ylim=c(-3.5,3.5), zero.weights=TRUE)
>> abline(0,0)
>> plotMA(normObj, array=i, main = "Normalized MvA",
>> ylim=c(-3.5,3.5), zero.weights=TRUE)
>> abline(0,0)
>> layout(1)
>> mtext(figureName, cex=1.25, line=3)
>> savePlot(filename=figureName, type=c("png"), device=dev.cur())
>> }
>>
>>> sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grDevices datasets splines graphics stats tcltk utils
>> [8] methods base
>>
>> other attached packages:
>> [1] limma_3.2.2 svSocket_0.9-48 TinnR_1.0.3 R2HTML_1.59-1
>> [5] Hmisc_3.7-0 survival_2.35-9
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.12.1 grid_2.10.1 lattice_0.18-3 svMisc_0.9-56
> tools_2.10.1
>>
>>> -----Original Message-----
>>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
>>> Sent: Saturday, March 13, 2010 6:39 PM
>>> To: White, Peter
>>> Cc: Bioconductor mailing list
>>> Subject: [BioC] Issue with limma and normalization of Agilent data
>>> generated with a 20-bit scan
>>>
>>> Dear Peter,
>>>
>>> You can't send attachments to the Bioconductor mailing list, so I
> have
>>> not
>>> seen your plots. However I am not aware of any issue such as you
>>> describe. The limma function normalizeWithinArrays includes all
> spots
>>> in
>>> the normalization, regardless of how large the A-value is. You
> haven't
>>> shown us any code, or any problem we can reproduce, so we can't tell
>>> whether or not you're doing something wrong. We don't know whether
>>> you're
>>> using probe weights, whether you've filtered control spots, etc etc.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Date: Fri, 12 Mar 2010 10:21:41 -0500
>>>> From: "White, Peter" <Peter.White at nationwidechildrens.org>
>>>> To: "'bioconductor at stat.math.ethz.ch'"
>>>> <bioconductor at stat.math.ethz.ch>
>>>> Subject: [BioC] Issue with limma and normalization of Agilent data
>>>> generated with a 20-bit scan
>>>> Content-Type: text/plain
>>>>
>>>> I have noticed an issue with the limma normalizeWithinArrays
> function
>>>> (and also with marray and maNorm). When normalizing two color data
>>>> generated with an Agilent 20-bt scanner it fails to normalize the
>>> high
>>>> intensity data (i.e. any points with an A value > 16). In our
> dataset
>>> we
>>>> have in excess of 400 elements with red and green intensities
> ranging
>>>> from 65500 to 475100. When we loess normalize the data any points
>>> beyond
>>>> A=16 appear to be untouched by the normalization. If the attached
>>>> figures come through this should be clear - when using maNorm and
>>> maPlot
>>>> it will plot the loess line and you can see it stop at 16.
>>>>
>>>> Is it possible for loess normalization to be extended to this
> higher
>>>> intensity data? Or am I just doing something wrong?
>>>>
>>>> Thanks,
>>>>
>>>> Peter
>>>>
>>>>
>>>> Peter White, Ph.D.
>>>> Director, Biomedical Genomics
> Core<http://genomics.nchresearch.org/>
>>>> Research Assistant Professor of Pediatrics
>>>> The Research Institute at
>>>> Nationwide Children's Hospital and
>>>> The Ohio State University
>>>>
>>>> Mailing Address:
>>>>
>>>> The Research Institute at
>>>> Nationwide Children's Hospital
>>>> 700 Children's Drive, W510
>>>> Columbus, OH 43205
>>>>
>>>> Assistant (Jennifer Neelans): (614) 722-2915
>>>> Office: (614) 355-2671
>>>> Lab: (614) 355-5252
>>>> Fax: (614) 722-2818
>>>> Web: http://genomics.nchresearch.org/
>>>
> ______________________________________________________________________
>>> The information in this email is confidential and intended solely
> for
>>> the addressee.
>>> You must not disclose, forward, print or use it without the
> permission
>>> of the sender.
>>>
> ______________________________________________________________________
>>
>> Confidentiality Notice: The following mail message, including any
> attachments, is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. The recipient is
> responsible to maintain the confidentiality of this information and to
> use the information only for authorized purposes. If you are not the
> intended recipient (or authorized to receive information for the
> intended recipient), you are hereby notified that any review, use,
> disclosure, distribution, copying, printing, or action taken in
> reliance on the contents of this e-mail is strictly prohibited. If you
> have received this communication in error, please notify Nationwide
> Children's Hospital immediately by replying to this e-mail and destroy
> all copies of the original message. Thank you.
>>
>>
>>
>>
>> ---------------------------------------------------------------------
> ---
>>
>>
>> ---------------------------------------------------------------------
> ---
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
>
> Best wishes
> Wolfgang
>
>
> --
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber/contact
>
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list