[BioC] question regarding MAS5 normalization with reduced probes

James W. MacDonald jmacdon at med.umich.edu
Tue Aug 31 19:15:10 CEST 2010


Hi James,

On 8/31/2010 12:17 PM, James Anderson wrote:
> Hi Jim,
>
> Thanks a lot for the link. I've tried the code in the link, it works without any problem if I were to take the whole probesets out. However, I do encounter some problem when I need to take not only some probe sets, but also some probes (but not the whole probe set) out, maybe because I did not provide the correct format of the probes.
>
> (I assume you are familiar with the content in the script provided in the link).
>
> If I randomly take out 2000 probe sets from U133A,
> maskedprobeSets = rownames(MAS5_matrix)[sample(1:22283,2000)]
> RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobeSets, cleancdf)
>
> It works fine and whatever affyBatch object read using the cleancdf has a reduced dimension.
>
> However, if I do
>
> maskedprobeSets = rownames(MAS5_matrix)[sample(1:22283,2000)]
> maskedprobes = rownames(pm(A))[1:2000]

Assuming that 'A' is an AffyBatch, what you will get back from that call 
to rownames is a bunch of numbers in character format.

An example using the Dilution dataset:

 > rownames(pm(Dilution))[1:10]
  [1] "175218" "356689" "227696" "237919" "275173" "203444" "357984" 
"368524"
  [9] "285352" "304510"

Which you can see is not very useful. What you want are the probeset 
IDs, along with an appended number (which is equal to the position of 
the probe in the probeset).

Now, say we are concerned about the "100_g_at" probeset in the Dilution 
dataset:

 > pm(Dilution, "100_g_at")
               20A   20B    10A   10B
100_g_at1   221.3 146.3  192.0 116.0
100_g_at2   685.0 479.0  493.0 328.3
100_g_at3  1126.3 724.3  849.0 498.3
100_g_at4   205.0 126.5  136.0  97.0
100_g_at5   580.8 341.8  374.0 226.0
100_g_at6   161.3 109.5  139.0  92.3
100_g_at7  1645.3 992.3 1006.8 670.0
100_g_at8   624.0 348.0  336.3 224.5
100_g_at9   274.0 156.0  203.8 119.0
100_g_at10  240.0 156.3  223.0 122.0
100_g_at11  438.0 278.3  362.5 198.0
100_g_at12  554.0 334.8  421.5 220.0
100_g_at13  235.0 148.0  151.0 107.5
100_g_at14  571.3 415.0  508.0 271.0
100_g_at15  904.0 562.0  689.0 330.0
100_g_at16  141.0  93.0  113.5  75.5

And we don't like the third and seventh probes. We could use

 > rownames(pm(Dilution, "100_g_at"))[c(3,7)]
[1] "100_g_at3" "100_g_at7"

And feed that into RemoveProbes(), which will then work.

Best,

Jim



> RemoveProbes(listOutProbes=maskedprobes, listOutProbeSets=maskedprobeSets, cleancdf)
>
> The error msg shows as:
> Error in get(pset[i], env = get(cdfpackagename)) :
>    object '315997at' not found
>
> Do you know what is the correct format of the input for the probes (not probe sets) to be taken out?
>
>
>
> Thanks a lot,
>
>
> -James
>
>
> --- On Mon, 8/30/10, James W. MacDonald<jmacdon at med.umich.edu>  wrote:
>
> From: James W. MacDonald<jmacdon at med.umich.edu>
> Subject: Re: [BioC] question regarding MAS5 normalization with reduced probes
> To: "James Anderson"<janderson_net at yahoo.com>
> Cc: "bioconductor"<bioconductor at stat.math.ethz.ch>
> Date: Monday, August 30, 2010, 12:25 PM
>
> Hi James,
>
> I misunderstood your question. I thought you already had a reduced set
> of probes you wanted to run mas5() on.
>
> So yeah, if you want to use a reduced set of probes you could use some
> code written by Ariel Chernomoretz (and modified by Jenny Drnevitch)
> that has been posted and referenced many times on this list:
>
> https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>
> Alternatively, you could play with the affxparser package, which has the
> capability (IIRC) to do the same.
>
> Best,
>
> Jim
>
>
>
> On 8/30/2010 10:29 AM, James Anderson wrote:
>> Hi Jim,
>>
>> Thanks for your email. I've run mas5 before, but only using default
>> setting. From the help, it does not look like there is a way to
>> specify which reduced set of probes you can use. In addition, from
>> the file, it looks like it has more to do with whether the "object"
>> is read using a reduced set of probes. (I believe if the "object" is
>> read using only the reduced set, mas5 will do the job), so don't know
>> whether it has more to do with the function ReadAffy, but from that,
>> it does not look like it has the option of specifying which reduced
>> set of probes, if we don't use alternative CDF file. Below is the
>> usage of mas5 function. mas5(object, normalize = TRUE, sc = 500,
>> analysis = "absolute", ...) Thanks,
>>
>> -James
>>
>> --- On Fri, 8/27/10, James W. MacDonald<jmacdon at med.umich.edu>
>> wrote:
>>
>> From: James W. MacDonald<jmacdon at med.umich.edu>  Subject: Re: [BioC]
>> question regarding MAS5 normalization with reduced probes To: "James
>> Anderson"<janderson_net at yahoo.com>  Cc:
>> "bioconductor"<bioconductor at stat.math.ethz.ch>  Date: Friday, August
>> 27, 2010, 10:04 AM
>>
>> Hi James,
>>
>> On 8/26/2010 1:05 PM, James Anderson wrote:
>>> Hi,
>>>
>>> I am trying to use MAS5 to normalize some cel files with reduced
>>> set of probes (some probes whose PM is not significantly higher
>>> than MM is filtered), does anyone know how to do this? Does that
>>> require creating a new CDF file?
>>
>> Have you tried running mas5() from the affy package? Having never
>> tried, I don't know, but it seems a simple enough test.
>>
>> If you do need to create a new cdf, you will want to use the
>> affxparser package.
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> thanks a bunch,
>>>
>>> -James
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list