[BioC] Adding chips to an existing set of normalised data

Rafael A. Irizarry ririzarr at jhsph.edu
Wed Jun 4 12:54:39 MEST 2003


i think approach 0 is theoretically the best. 
the only reason i ranked it as works is because of 
how time consuming it is. of course, if someone has time and expertise to 
code a  visual basic interface that handles 250 chips in "a matter of 
minutes" then i would re-rank this approach as my favorite.

On Wed, 4 
Jun 2003, Park, Richard wrote:

> Hi Rafael, 
> I was just wondering if you could give me your opinion on my method of 
normalization. I was always under the impression that it is best to always 
renormalize the entire data set whenever you add or remove an additional 
chip. This would correspond to your 0 method. I do understand that this is 
the most time consuming method, but I have created a visual basic 
interface that keeps track of all the .cel files we have for our lab.       
> 
> So, at any point you wish to have a different group of files to analyze, 
it is a matter of clicking on the data sets you wish to include, and from 
here we normalize everything together from the .cel files using rma. It is 
usually a matter of minutes to have everything renormalized together, and 
we currently have a collection of about 250 affy chips so far that can be 
combined together in any combination.      >
 
> I thought this was the most precise way of creating normalized data 
sets, but are the other methods you talked about better and more accurate?   
> 
> Thanks, 
> Richard Park 
> Computational Data Analyzer
> Joslin Diabetes Center
> 
> -----Original Message-----
> From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]
> Sent: Wednesday, June 04, 2003 10:53 AM
> To: Crispin Miller
> Cc: Bioconductor (E-mail)
> Subject: Re: [BioC] Adding chips to an existing set of normalised data
> 
> 
> if your data is decnent what you describe wont be that big an issue, 
> but here are various statergies to solve the problem you describe:
> 
> 0- keep your cel files and redo everything every time (con: not efficient 
> at all)
> 1- do rma on probe level. then before any expression level analysis 
> normalize the merged exprsets. (con: you may over-normalize)
> 2- decide on a "tyical probe level distribution" and alway map to that 
> (con: requires choice of a distribution and some extra coding)
> 3- use a non-multi array rma (ra?). you bg correct, use a non 
> multichip normalization such as rescaling (can vsn be made mono-chip?) 
> use robust summary, e.g. median, tukey.biweight, etc...  
> (con: under my defition of a good expression measure: it wont be as good 
> as rma but itll be better than mas 5.0) 
> to see how well this does you can put it through 
> affycomp.biostat.jhsph.edu
> 
> i would rank these stratergies: 2,1,3,0. to pick a 
> typical probe level distribution in strategy 2 i 
> would use as many arrays as possible. i would not use a parametric 
> distribution, such as normal, just for computational convinience.
> 
> 
> On Wed, 4 Jun 2003, 
> Crispin Miller wrote:
> 
> > Hi!
> > Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! 
> > 
> > These have now sparked a different question for us:
> > We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. 
> > 
> > So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in  the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper).
> > 
> > Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles...
> > 
> > Crispin
> > http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
> >  
> > --------------------------------------------------------
> > 
> >  
> > This email is confidential and intended solely for the use of th... {{dropped}}
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list