[BioC] How do I set the median when performing MAS5.0 normalization

James W. MacDonald jmacdon at med.umich.edu
Wed Dec 8 15:40:21 CET 2010


Hi Matthew,

On 12/7/2010 7:13 PM, Matthew Willmann wrote:
> Hi Jim,
>
> I tried doing the following, and I still could not replicate the exact values in the published paper.  I tried altering the code, as you suggested, as I do not know how to "roll yer own", as I am still too green.
>
>> mas50<- function (object, normalize = TRUE, sc = 500, analysis = "absolute",
> +    ...)
> + {
> +    res<- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas",
> +        normalize = FALSE, summary.method = "mas", ...)
> +    if (normalize)
> +        res<- affy.scalevalues.exprSet(res, sc = sc, analysis = analysis)
> +    return(res)
> + }
>> affy.scalevalues.exprSet<- function (eset, sc = 500, analysis = "absolute")
> + {
> +    analysis<- match(analysis, c("absolute", "comparison"))
> +    if (analysis == 1)
> +        nf<- 1
> +    else stop("sorry! comparison not implemented.")
> +    for (i in 1:ncol(exprs(eset))) {
> +        slg<- exprs(eset)[, i]
> +        sf<- sc/mean(slg, trim = 0.5)
> +        reported.value<- nf * sf * slg
> +        exprs(eset)[, i]<- reported.value
> +    }
> +    return(eset)
> + }

I think you are a bit confused here. Setting the mean() function to use 
50% trimming will give you the median, but keeping sc = 500 will median 
center at 500, not the 50 you originally stated.

To further complicate matters, I don't know what the authors of the 
paper you cite really did. They may in fact have not normalized the data 
in this way at all, instead just median centering at 50.

It's easy enough to do that, if you want to check. Just do something like

eset <- mas5(data.MRW, FALSE)
meds <- apply(exprs(eset), 2, median)
dat <- sweep(exprs(eset), 2, meds - 50)
exprs(eset) <- dat

Best,

Jim

>> library(affy)
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
>    Vignettes contain introductory material. To view, type
>    'openVignette()'. To cite Bioconductor, see
>    'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>
> Attaching package: 'affy'
>
> The following object(s) are masked _by_ '.GlobalEnv':
>
>      affy.scalevalue.exprSet, mas5
>
>
>> data.MRW=ReadAffy(filenames=Cel.files)
>> data.mas50=mas50(data.MRW, normalize=FALSE, sc=500)
> background correction: mas
> PM/MM correction : mas
> expression values: mas
> background correcting...done.
> 22810 ids to be processed
> |                    |
> |####################|
>> data.mas50.exprs=exprs(data.mas50)
>> data.wk=data.mas50.exprs
>> write.table(data.wk,"test.txt",sep="\t")
>
> -----------------------------------------------------
> Matthew R. Willmann, Ph.D.
> Research Associate, Poethig Lab
> University of Pennsylvania
> Department of Biology
> 433 S. University Avenue
> Philadelphia, PA 19104
> Lab phone: 215-898-8916
> Cell: 508-243-2495
> Fax: 215-898-8780
>
> On Dec 6, 2010, at 6:54 PM, James MacDonald wrote:
>
>> Hi Matthew,
>>
>> No, the sc parameter scales the data using a trimmed mean rather than the median. You can see that by inspecting first the code for mas5():
>>
>> mas5<- function (object, normalize = TRUE, sc = 500, analysis = "absolute",
>>     ...)
>> {
>>     res<- expresso(object, bgcorrect.method = "mas", pmcorrect.method = "mas",
>>         normalize = FALSE, summary.method = "mas", ...)
>>     if (normalize)
>>         res<- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis)
>>     return(res)
>> }
>>
>> and then by looking at the code for affy.scalevalue.exprSet():
>>
>> affy.scalevalue.exprSet<- function (eset, sc = 500, analysis = "absolute")
>> {
>>     analysis<- match(analysis, c("absolute", "comparison"))
>>     if (analysis == 1)
>>         nf<- 1
>>     else stop("sorry! comparison not implemented.")
>>     for (i in 1:ncol(exprs(eset))) {
>>         slg<- exprs(eset)[, i]
>>         sf<- sc/mean(slg, trim = 0.02)
>>         reported.value<- nf * sf * slg
>>         exprs(eset)[, i]<- reported.value
>>     }
>>     return(eset)
>> }
>>
>> So this just centers things on a 2% trimmed mean. You _could_ just hijack this function and set the trim to 0.5 to do the median scaling, but it is probably easier to 'roll yer own'.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>>>>> Matthew Willmann  12/06/10 5:53 PM>>>
>> Hi Jim,
>>
>> Thank you.  Does this mean that I can obtain the desired normalization by the following?
>>
>> mas5(data.MRW, normalize=FALSE, sc=50)
>>
>>
>> Matthew
>> -----------------------------------------------------
>> Matthew R. Willmann, Ph.D.
>> Research Associate, Poethig Lab
>> University of Pennsylvania
>> Department of Biology
>> 433 S. University Avenue
>> Philadelphia, PA 19104
>> Lab phone: 215-898-8916
>> Cell: 508-243-2495
>> Fax: 215-898-8780
>>
>> On Dec 6, 2010, at 3:55 PM, James W. MacDonald wrote:
>>
>>> Hi Matthew,
>>>
>>> On 12/6/2010 11:16 AM, Matthew Willmann wrote:
>>>> Hello,
>>>>
>>>> I am trying to replicate results in a published paper, where they performed a MAS5.0 normalization with the median normalized to 50 and then defined genes as present if the normalized expression value is higher than 30.  Does anyone know how I can alter the standard code to establish these parameters?
>>>>
>>>>> Cel.files=list.files(pattern=".CEL")
>>>>> data.MRW=ReadAffy(filenames=Cel.files)
>>>>> mas5(data.MRW, normalize=TRUE, sc=500)
>>>
>>>  From the help page for mas5():
>>>
>>> Usage:
>>>
>>>     mas5(object, normalize = TRUE, sc = 500, analysis = "absolute", ...)
>>>
>>> Arguments:
>>>
>>> object: an instance of 'AffyBatch'
>>>
>>> normalize: logical. If 'TRUE' scale normalization is used after we
>>>          obtain an instance of 'ExpressionSet'
>>>
>>> So if you use mas5() with normalize = FALSE, then you don't get any scale normalization. You can then simply scale each chip to have a median of 50.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> Thank you for your time and advice.
>>>>
>>>>
>>>> Matthew
>>>>
>>>>
>>>>
>>>> -----------------------------------------------------
>>>> Matthew R. Willmann, Ph.D.
>>>> Research Associate, Poethig Lab
>>>> University of Pennsylvania
>>>> Department of Biology
>>>> 433 S. University Avenue
>>>> Philadelphia, PA 19104
>>>> Lab phone: 215-898-8916
>>>> Cell: 508-243-2495
>>>> Fax: 215-898-8780
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Douglas Lab
>>> University of Michigan
>>> Department of Human Genetics
>>> 5912 Buhl
>>> 1241 E. Catherine St.
>>> Ann Arbor MI 48109-5618
>>> 734-615-7826
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>>>
>>
>>
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
>>
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list