[BioC] How do I set the median when performing MAS5.0 normalization

Matthew Willmann willmann at sas.upenn.edu
Wed Dec 8 01:13:40 CET 2010


Hi Jim,

I tried doing the following, and I still could not replicate the exact values in the published paper.  I tried altering the code, as you suggested, as I do not know how to "roll yer own", as I am still too green.

> mas50 <- function (object, normalize = TRUE, sc = 500, analysis = "absolute", 
+    ...) 
+ {
+    res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", 
+        normalize = FALSE, summary.method = "mas", ...)
+    if (normalize) 
+        res <- affy.scalevalues.exprSet(res, sc = sc, analysis = analysis)
+    return(res)
+ }
> affy.scalevalues.exprSet <- function (eset, sc = 500, analysis = "absolute") 
+ {
+    analysis <- match(analysis, c("absolute", "comparison"))
+    if (analysis == 1) 
+        nf <- 1
+    else stop("sorry! comparison not implemented.")
+    for (i in 1:ncol(exprs(eset))) {
+        slg <- exprs(eset)[, i]
+        sf <- sc/mean(slg, trim = 0.5)
+        reported.value <- nf * sf * slg
+        exprs(eset)[, i] <- reported.value
+    }
+    return(eset)
+ }
> library(affy)
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.


Attaching package: 'affy'

The following object(s) are masked _by_ '.GlobalEnv':

    affy.scalevalue.exprSet, mas5


> data.MRW=ReadAffy(filenames=Cel.files)
> data.mas50=mas50(data.MRW, normalize=FALSE, sc=500)
background correction: mas 
PM/MM correction : mas 
expression values: mas 
background correcting...done.
22810 ids to be processed
|                    |
|####################|
> data.mas50.exprs=exprs(data.mas50)
> data.wk=data.mas50.exprs
> write.table(data.wk,"test.txt",sep="\t")

-----------------------------------------------------
Matthew R. Willmann, Ph.D.
Research Associate, Poethig Lab
University of Pennsylvania
Department of Biology
433 S. University Avenue
Philadelphia, PA 19104
Lab phone: 215-898-8916
Cell: 508-243-2495
Fax: 215-898-8780

On Dec 6, 2010, at 6:54 PM, James MacDonald wrote:

> Hi Matthew,
> 
> No, the sc parameter scales the data using a trimmed mean rather than the median. You can see that by inspecting first the code for mas5():
> 
> mas5 <- function (object, normalize = TRUE, sc = 500, analysis = "absolute", 
>    ...) 
> {
>    res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas", 
>        normalize = FALSE, summary.method = "mas", ...)
>    if (normalize) 
>        res <- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis)
>    return(res)
> }
> 
> and then by looking at the code for affy.scalevalue.exprSet():
> 
> affy.scalevalue.exprSet <- function (eset, sc = 500, analysis = "absolute") 
> {
>    analysis <- match(analysis, c("absolute", "comparison"))
>    if (analysis == 1) 
>        nf <- 1
>    else stop("sorry! comparison not implemented.")
>    for (i in 1:ncol(exprs(eset))) {
>        slg <- exprs(eset)[, i]
>        sf <- sc/mean(slg, trim = 0.02)
>        reported.value <- nf * sf * slg
>        exprs(eset)[, i] <- reported.value
>    }
>    return(eset)
> }
> 
> So this just centers things on a 2% trimmed mean. You _could_ just hijack this function and set the trim to 0.5 to do the median scaling, but it is probably easier to 'roll yer own'.
> 
> Best,
> 
> Jim
> 
> 
> 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>>>> Matthew Willmann  12/06/10 5:53 PM >>>
> Hi Jim,
> 
> Thank you.  Does this mean that I can obtain the desired normalization by the following?
> 
> mas5(data.MRW, normalize=FALSE, sc=50)
> 
> 
> Matthew
> -----------------------------------------------------
> Matthew R. Willmann, Ph.D.
> Research Associate, Poethig Lab
> University of Pennsylvania
> Department of Biology
> 433 S. University Avenue
> Philadelphia, PA 19104
> Lab phone: 215-898-8916
> Cell: 508-243-2495
> Fax: 215-898-8780
> 
> On Dec 6, 2010, at 3:55 PM, James W. MacDonald wrote:
> 
>> Hi Matthew,
>> 
>> On 12/6/2010 11:16 AM, Matthew Willmann wrote:
>>> Hello,
>>> 
>>> I am trying to replicate results in a published paper, where they performed a MAS5.0 normalization with the median normalized to 50 and then defined genes as present if the normalized expression value is higher than 30.  Does anyone know how I can alter the standard code to establish these parameters?
>>> 
>>>> Cel.files=list.files(pattern=".CEL")
>>>> data.MRW=ReadAffy(filenames=Cel.files)
>>>> mas5(data.MRW, normalize=TRUE, sc=500)
>> 
>> From the help page for mas5():
>> 
>> Usage:
>> 
>>    mas5(object, normalize = TRUE, sc = 500, analysis = "absolute", ...)
>> 
>> Arguments:
>> 
>> object: an instance of 'AffyBatch'
>> 
>> normalize: logical. If 'TRUE' scale normalization is used after we
>>         obtain an instance of 'ExpressionSet'
>> 
>> So if you use mas5() with normalize = FALSE, then you don't get any scale normalization. You can then simply scale each chip to have a median of 50.
>> 
>> Best,
>> 
>> Jim
>> 
>> 
>>> 
>>> Thank you for your time and advice.
>>> 
>>> 
>>> Matthew
>>> 
>>> 
>>> 
>>> -----------------------------------------------------
>>> Matthew R. Willmann, Ph.D.
>>> Research Associate, Poethig Lab
>>> University of Pennsylvania
>>> Department of Biology
>>> 433 S. University Avenue
>>> Philadelphia, PA 19104
>>> Lab phone: 215-898-8916
>>> Cell: 508-243-2495
>>> Fax: 215-898-8780
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 
>> 
> 
> 
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 
> 
> 



More information about the Bioconductor mailing list