Fwd: RE: [BioC] large amount of slides

Naomi Altman naomi at stat.psu.edu
Fri Jun 4 21:08:21 CEST 2004


>Date: Fri, 04 Jun 2004 11:59:39 -0400
>To: "Park, Richard" <Richard.Park at joslin.harvard.edu>
>From: Naomi Altman <naomi at stat.psu.edu>
>Subject: RE: [BioC] large amount of slides
>
>Feedback on this proposal would be appreciated:
>
>1) Start with quantile normalization of probes for a random subset of the 
>data.
>
>2) Use these slides to define the probe distribution F.
>
>3) Use F to do quantile normalization of probes on all of the arrays.
>
>4) Use a within array robust method (e.g. Tukey biweight) to combine the 
>probes into genes.
>
>There was previous discussion of the use of within slide probe combination 
>versus between-slide (median polish) under the topic "median polish vs 
>mas".  The upshot was that within slide normalization cannot detect 
>probewise outliers adequately.  This probably means that step 4 could be 
>improved upon in some clever way that uses several arrays but is faster 
>than median polish on all arrays.
>
>--Naomi
>
>
>At 11:40 AM 6/4/2004 -0400, you wrote:
>>Hi Vada,
>>I would caution you on doing rma on that many datasets. I have noticed a 
>>trend in rma, that things get even more underestimated as the number and 
>>variance of the data increases. I have been doing an analysis on immune 
>>cell types for about 100 cel files. My computer (windows 2000, 2gb of 
>>ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty sure that 
>>my problem is that windows 2000 has a maximum allocation of 1gb.
>>
>>But if most of your data is pretty related (i.e. same tissues, just a ko 
>>vs wt) you should be fine w/ rma. I would caution against using rma on 
>>data that is very different.
>>
>>hth,
>>richard
>>
>>-----Original Message-----
>>From: Vada Wilcox [mailto:v_wilcox at hotmail.com]
>>Sent: Friday, June 04, 2004 11:06 AM
>>To: bioconductor at stat.math.ethz.ch
>>Subject: [BioC] large amount of slides
>>
>>
>>Dear all,
>>
>>I have been using RMA succesfully for a while now, but in the past I have
>>only used it on a small amount of slides. I would like to do my study on a
>>larger scale now, with data (series of experiments) from other researchers
>>as well. My questions is the following: if I want to study, let's say 200
>>slides, do I have to read them all into R at once (so together I mean, with
>>read.affy() in package affy), or is it OK to read them series by series (so
>>all wild types and controls of one researcher at a time)?
>>
>>If it is really necessary to read all of them in at one time how much RAM
>>would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now
>>it's possible to raise it by using 'max vsize = ...' but I haven't been able
>>to do it succesfully for 200 experiments though. Can somebody help me on
>>this?
>>
>>Many thanks in advance,
>>
>>Vada
>>
>>_________________________________________________________________
>>
>>http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list