[BioC] Conversion of affymetrix cell file to raw text file

James W. MacDonald jmacdon at med.umich.edu
Fri Dec 17 17:51:45 CET 2010


Hi Viritha,

On 12/17/2010 11:11 AM, viritha kaza wrote:
> Hi James,
> I am actually interested in getting a raw (unnormalised) microarray
> expression dataset. Since I am interested in performing this for many
> datasets, I would like to perform normalization as one of the paper suggests
> to remove bias due to the sample preparation  and different platforms-
> "Briefly, for each expression data set, individual probe intensity of each
> array was divided by the averaged probe intensity across all arrays within
> the data set, then each value was log (base 2) transformed. For
> normalization, first, average expression value of all probes in each array
> was calculated. Then for each array, expression value of each probe was
> subtracted by the averaged expression value. By doing so, average expression
> value of all probes in each array in each expression data set will be zero."

Two things here:

1.) That normalization is as naive as you can possibly get. We have gone 
_way_ past the stage where people think a simple location normalization 
is a reasonable thing to do.

All this does is shift the data so the means line up, not taking into 
account that there might be more subtle technical artifacts that should 
be removed. You will be much better served by using the stock 
normalization in rma(), or if you really want to get fancy, you might 
want to use vsn. But you will be regressing to maybe the year 2000 if 
you use the normalization you suggest here.

2.) The normalization you are considering is designed for spotted 
arrays, where each spot measures transcript from two different samples. 
Because of that fact, the data are usually reported as a ratio (e.g., 
cy3/cy5). For these data, exact equivalence of transcript would be 
expected to be a 1 (e.g., equal amounts of cy3 and cy5 fluorescence). If 
you then take logs, equivalence will then be equal to zero.

In that case, taking the mean and subtracting (centering on the mean) is 
a reasonable but naive thing to do. However, in your case, the data 
range from approximately 2^6 - 2^14 or so. If you take log_2 of these 
data, they will then range from 6 - 14. Because they aren't ratios, and 
they aren't really symmetrically distributed there isn't a compelling 
reason to normalize to zero.

If you still want to progress with this idea, note that pretty much all 
of the summarization methods have a normalize argument, so you can 
simply set normalize = FALSE, and you will then get unnormalized, 
summarized data.

See e.g., ?rma

Best,

Jim


> Hence to perform above steps I thought I would need a raw expression dataset
> from the cell files afterwhich I can normalise by the above strategy to
> remove bias.So I am expecting to get a single value for each probe in an
> array.
> I hope this helps in understanding what exactly I want the expression
> dataset to be.
> Thanks,
> Viritha
>
> On Fri, Dec 17, 2010 at 10:00 AM, James W. MacDonald
> <jmacdon at med.umich.edu>wrote:
>
>>
>>
>> On 12/16/2010 3:35 PM, viritha kaza wrote:
>>
>>> Thanks James.There was no error.
>>> But I see that I get 11 values for the same probe.Why does it happen? If I
>>> perform MM as well then again I would get another file.How do I finally
>>> get
>>> one value for each probe in an array?
>>>
>>
>> I think we need to back up a bit here. On Affy chips there are multiple
>> probes used to interrogate a single transcript. As you note, for this
>> particular chip there are usually 11 probes. All of the probes for a given
>> transcript make up a probeset.
>>
>> When we process these data, we first background correct and normalize the
>> probe values to eliminate as much non-biological variability as possible,
>> and then we summarize all the probes in each probeset to generate the final
>> value, which we hope is proportional to the expression of the transcript we
>> are trying to measure.
>>
>> So we have to be precise about our terminology. You originally asked for a
>> text file containing unnormalized probe values, which is what the code I
>> supplied does. Evidently that is not what you wanted, so can you precisely
>> state what it is that you do want?
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>>
>> Thanks,
>>> Viritha
>>>
>>> On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald
>>> <jmacdon at med.umich.edu>wrote:
>>>
>>> Make that
>>>>
>>>> fun<- function(q,r){
>>>> row.names(r)<- rep(q, nrow(r))
>>>> r
>>>> }
>>>>
>>>> Which of course makes more sense.
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>
>>>> On 12/16/2010 12:04 PM, viritha kaza wrote:
>>>>
>>>> Hi James,
>>>>> Thanks for your reply,
>>>>> I am new to R statistics.
>>>>> Do I have to give the values for q or r because I am getting the
>>>>> following
>>>>> error when I type mapply command -
>>>>>
>>>>> Error in dimnames(x)<- dn :
>>>>>    length of 'dimnames' [1] not equal to array extent
>>>>>
>>>>> There are 5 arrays in the experiment.
>>>>>
>>>>> Thank you,
>>>>> Viritha
>>>>>
>>>>>
>>>>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald
>>>>> <jmacdon at med.umich.edu>wrote:
>>>>>
>>>>> Hi Viritha,
>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/16/2010 10:45 AM, viritha kaza wrote:
>>>>>>
>>>>>> Hi Group,
>>>>>>
>>>>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse Genome
>>>>>>> 430
>>>>>>> 2.0 Array.I want to create an unnormalised expression microarray data
>>>>>>> set.I
>>>>>>> have the cell files and cdf file for this.I want the intensities in
>>>>>>> the
>>>>>>> probe level.Is this possible in R or any other source? or how can I
>>>>>>> get
>>>>>>> this
>>>>>>> expression microarray dataset?
>>>>>>>
>>>>>>>
>>>>>>> library(affy)
>>>>>> dat<- ReadAffy()
>>>>>> pms<- pm(dat, LISTRUE=TRUE)
>>>>>> fun<- function(q,r){
>>>>>> row.names(r)<- rep(q, ncol(r))
>>>>>> r
>>>>>> }
>>>>>>
>>>>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE)
>>>>>> pms<- do.call("rbind", pms)
>>>>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE,
>>>>>> col.names = TRUE, sep = "\t")
>>>>>>
>>>>>> You can do similar for MM probes if you desire.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Thank you in advance,
>>>>>>
>>>>>> Viritha
>>>>>>>
>>>>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k at gmail.com>
>>>>>>>   wrote:
>>>>>>>
>>>>>>> Hi group,
>>>>>>>
>>>>>>> If I want to create raw txt file of microarray data from the
>>>>>>>> (affymetrix)
>>>>>>>> cell file, how do I create the expression set with raw signal
>>>>>>>> intensity.I
>>>>>>>> know that only cell file with the version 3 can be opened as excel
>>>>>>>> file
>>>>>>>> as
>>>>>>>> it is in ascii format.
>>>>>>>> In one such cell file the intensity is indicated as:
>>>>>>>>     CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5
>>>>>>>> 2009.9
>>>>>>>> 36 2 0 136.3 21.2 36
>>>>>>>>          But I am not sure how to assign the probe numbers to the
>>>>>>>> CellHeaders and I would also like to know if the raw intensity taken
>>>>>>>> is
>>>>>>>> just
>>>>>>>> the mean intensity? Can this be performed in R?
>>>>>>>> Waiting for your response,
>>>>>>>> Thank you in advance,
>>>>>>>> Viritha
>>>>>>>>
>>>>>>>>
>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> James W. MacDonald, M.S.
>>>>>> Biostatistician
>>>>>> Douglas Lab
>>>>>> University of Michigan
>>>>>> Department of Human Genetics
>>>>>> 5912 Buhl
>>>>>> 1241 E. Catherine St.
>>>>>> Ann Arbor MI 48109-5618
>>>>>> 734-615-7826
>>>>>> **********************************************************
>>>>>> Electronic Mail is not secure, may not be read every day, and should
>>>>>> not
>>>>>> be
>>>>>> used for urgent or sensitive issues
>>>>>>
>>>>>>
>>>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> Douglas Lab
>>>> University of Michigan
>>>> Department of Human Genetics
>>>> 5912 Buhl
>>>> 1241 E. Catherine St.
>>>> Ann Arbor MI 48109-5618
>>>> 734-615-7826
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and should not
>>>> be
>>>> used for urgent or sensitive issues
>>>>
>>>>
>>>
>> --
>>   James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should not be
>> used for urgent or sensitive issues
>>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list