Thanks James!!!! The paper that I refered was a recent one 2010 so I thought
was easier to follow. I think as you said it might be better to choose an
another method.

On Fri, Dec 17, 2010 at 11:51 AM, James W. MacDonald
<jmacdon@med.umich.edu>wrote:

> Hi Viritha,
>
>
> On 12/17/2010 11:11 AM, viritha kaza wrote:
>
>> Hi James,
>> I am actually interested in getting a raw (unnormalised) microarray
>> expression dataset. Since I am interested in performing this for many
>> datasets, I would like to perform normalization as one of the paper
>> suggests
>> to remove bias due to the sample preparation  and different platforms-
>> "Briefly, for each expression data set, individual probe intensity of each
>> array was divided by the averaged probe intensity across all arrays within
>> the data set, then each value was log (base 2) transformed. For
>> normalization, first, average expression value of all probes in each array
>> was calculated. Then for each array, expression value of each probe was
>> subtracted by the averaged expression value. By doing so, average
>> expression
>> value of all probes in each array in each expression data set will be
>> zero."
>>
>
> Two things here:
>
> 1.) That normalization is as naive as you can possibly get. We have gone
> _way_ past the stage where people think a simple location normalization is a
> reasonable thing to do.
>
> All this does is shift the data so the means line up, not taking into
> account that there might be more subtle technical artifacts that should be
> removed. You will be much better served by using the stock normalization in
> rma(), or if you really want to get fancy, you might want to use vsn. But
> you will be regressing to maybe the year 2000 if you use the normalization
> you suggest here.
>
> 2.) The normalization you are considering is designed for spotted arrays,
> where each spot measures transcript from two different samples. Because of
> that fact, the data are usually reported as a ratio (e.g., cy3/cy5). For
> these data, exact equivalence of transcript would be expected to be a 1
> (e.g., equal amounts of cy3 and cy5 fluorescence). If you then take logs,
> equivalence will then be equal to zero.
>
> In that case, taking the mean and subtracting (centering on the mean) is a
> reasonable but naive thing to do. However, in your case, the data range from
> approximately 2^6 - 2^14 or so. If you take log_2 of these data, they will
> then range from 6 - 14. Because they aren't ratios, and they aren't really
> symmetrically distributed there isn't a compelling reason to normalize to
> zero.
>
> If you still want to progress with this idea, note that pretty much all of
> the summarization methods have a normalize argument, so you can simply set
> normalize = FALSE, and you will then get unnormalized, summarized data.
>
> See e.g., ?rma
>
> Best,
>
> Jim
>
>
>
> Hence to perform above steps I thought I would need a raw expression
>> dataset
>> from the cell files afterwhich I can normalise by the above strategy to
>> remove bias.So I am expecting to get a single value for each probe in an
>> array.
>> I hope this helps in understanding what exactly I want the expression
>> dataset to be.
>> Thanks,
>> Viritha
>>
>> On Fri, Dec 17, 2010 at 10:00 AM, James W. MacDonald
>> <jmacdon@med.umich.edu>wrote:
>>
>>
>>>
>>> On 12/16/2010 3:35 PM, viritha kaza wrote:
>>>
>>> Thanks James.There was no error.
>>>> But I see that I get 11 values for the same probe.Why does it happen? If
>>>> I
>>>> perform MM as well then again I would get another file.How do I finally
>>>> get
>>>> one value for each probe in an array?
>>>>
>>>>
>>> I think we need to back up a bit here. On Affy chips there are multiple
>>> probes used to interrogate a single transcript. As you note, for this
>>> particular chip there are usually 11 probes. All of the probes for a
>>> given
>>> transcript make up a probeset.
>>>
>>> When we process these data, we first background correct and normalize the
>>> probe values to eliminate as much non-biological variability as possible,
>>> and then we summarize all the probes in each probeset to generate the
>>> final
>>> value, which we hope is proportional to the expression of the transcript
>>> we
>>> are trying to measure.
>>>
>>> So we have to be precise about our terminology. You originally asked for
>>> a
>>> text file containing unnormalized probe values, which is what the code I
>>> supplied does. Evidently that is not what you wanted, so can you
>>> precisely
>>> state what it is that you do want?
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>>> Viritha
>>>>
>>>> On Thu, Dec 16, 2010 at 2:18 PM, James W. MacDonald
>>>> <jmacdon@med.umich.edu>wrote:
>>>>
>>>> Make that
>>>>
>>>>>
>>>>> fun<- function(q,r){
>>>>> row.names(r)<- rep(q, nrow(r))
>>>>> r
>>>>> }
>>>>>
>>>>> Which of course makes more sense.
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/16/2010 12:04 PM, viritha kaza wrote:
>>>>>
>>>>> Hi James,
>>>>>
>>>>>> Thanks for your reply,
>>>>>> I am new to R statistics.
>>>>>> Do I have to give the values for q or r because I am getting the
>>>>>> following
>>>>>> error when I type mapply command -
>>>>>>
>>>>>> Error in dimnames(x)<- dn :
>>>>>>   length of 'dimnames' [1] not equal to array extent
>>>>>>
>>>>>> There are 5 arrays in the experiment.
>>>>>>
>>>>>> Thank you,
>>>>>> Viritha
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 16, 2010 at 11:22 AM, James W. MacDonald
>>>>>> <jmacdon@med.umich.edu>wrote:
>>>>>>
>>>>>> Hi Viritha,
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On 12/16/2010 10:45 AM, viritha kaza wrote:
>>>>>>>
>>>>>>> Hi Group,
>>>>>>>
>>>>>>> Let me clearly explain.I have the [Mouse430_2] Affymetrix Mouse
>>>>>>>> Genome
>>>>>>>> 430
>>>>>>>> 2.0 Array.I want to create an unnormalised expression microarray
>>>>>>>> data
>>>>>>>> set.I
>>>>>>>> have the cell files and cdf file for this.I want the intensities in
>>>>>>>> the
>>>>>>>> probe level.Is this possible in R or any other source? or how can I
>>>>>>>> get
>>>>>>>> this
>>>>>>>> expression microarray dataset?
>>>>>>>>
>>>>>>>>
>>>>>>>> library(affy)
>>>>>>>>
>>>>>>> dat<- ReadAffy()
>>>>>>> pms<- pm(dat, LISTRUE=TRUE)
>>>>>>> fun<- function(q,r){
>>>>>>> row.names(r)<- rep(q, ncol(r))
>>>>>>> r
>>>>>>> }
>>>>>>>
>>>>>>> pms<- mapply(fun, names(pms), pms, SIMPLIFY = FALSE)
>>>>>>> pms<- do.call("rbind", pms)
>>>>>>> write.table(pms, "Raw PM data.txt", quote = FALSE, row.names = TRUE,
>>>>>>> col.names = TRUE, sep = "\t")
>>>>>>>
>>>>>>> You can do similar for MM probes if you desire.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Thank you in advance,
>>>>>>>
>>>>>>> Viritha
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Dec 15, 2010 at 4:05 PM, viritha kaza<viritha.k@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> Hi group,
>>>>>>>>
>>>>>>>> If I want to create raw txt file of microarray data from the
>>>>>>>>
>>>>>>>>> (affymetrix)
>>>>>>>>> cell file, how do I create the expression set with raw signal
>>>>>>>>> intensity.I
>>>>>>>>> know that only cell file with the version 3 can be opened as excel
>>>>>>>>> file
>>>>>>>>> as
>>>>>>>>> it is in ascii format.
>>>>>>>>> In one such cell file the intensity is indicated as:
>>>>>>>>>    CellHeader=X Y MEAN STDV NPIXELS 0 0 137.3 25.1 36 1 0 10730.5
>>>>>>>>> 2009.9
>>>>>>>>> 36 2 0 136.3 21.2 36
>>>>>>>>>         But I am not sure how to assign the probe numbers to the
>>>>>>>>> CellHeaders and I would also like to know if the raw intensity
>>>>>>>>> taken
>>>>>>>>> is
>>>>>>>>> just
>>>>>>>>> the mean intensity? Can this be performed in R?
>>>>>>>>> Waiting for your response,
>>>>>>>>> Thank you in advance,
>>>>>>>>> Viritha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor@r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>> James W. MacDonald, M.S.
>>>>>>> Biostatistician
>>>>>>> Douglas Lab
>>>>>>> University of Michigan
>>>>>>> Department of Human Genetics
>>>>>>> 5912 Buhl
>>>>>>> 1241 E. Catherine St.
>>>>>>> Ann Arbor MI 48109-5618
>>>>>>> 734-615-7826
>>>>>>> **********************************************************
>>>>>>> Electronic Mail is not secure, may not be read every day, and should
>>>>>>> not
>>>>>>> be
>>>>>>> used for urgent or sensitive issues
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> Douglas Lab
>>>>> University of Michigan
>>>>> Department of Human Genetics
>>>>> 5912 Buhl
>>>>> 1241 E. Catherine St.
>>>>> Ann Arbor MI 48109-5618
>>>>> 734-615-7826
>>>>> **********************************************************
>>>>> Electronic Mail is not secure, may not be read every day, and should
>>>>> not
>>>>> be
>>>>> used for urgent or sensitive issues
>>>>>
>>>>>
>>>>>
>>>> --
>>>  James W. MacDonald, M.S.
>>> Biostatistician
>>> Douglas Lab
>>> University of Michigan
>>> Department of Human Genetics
>>> 5912 Buhl
>>> 1241 E. Catherine St.
>>> Ann Arbor MI 48109-5618
>>> 734-615-7826
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and should not
>>> be
>>> used for urgent or sensitive issues
>>>
>>>
> --
>  James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues
>

	[[alternative HTML version deleted]]

