[BioC] Reading 5000 celfiles with ReadAffy

Henrik Bengtsson hb at biostat.ucsf.edu
Wed Feb 22 21:56:32 CET 2012


Hi Ying.

On Wed, Feb 22, 2012 at 7:16 AM, ying chen <ying_chen at live.com> wrote:
> Hi Henrik,
>
> The regular machine does not mean windows machine, right?

No, it indeed means any machine on which you can install R, e.g. OSX,
Linux and Windows.  A good test is to see if you can install the
affxparser package (on BioC).  If so, you can also use
aroma.affymetrix.

>  When I run aroma
> on windows 7 64-bit machines, the problem is that the R GUI window always
> freezes (Not Responding) when I tried to process more than 2000 cels. I
> tried once with 1400 cels with aroma and it finished successfully :)

I don't see why it should work with Rgui.  Did you make sure to
disable 'Misc -> Buffered output (Ctrl+W)' in Rgui?  That way you will
see all messages as they appear, and not when it completed, which may
appear as a "freeze".  If you use a high verbosity level in aroma, it
will print lots of messages.  Also, with the buffer enabled, it may be
that it is overflows (which then would be a bug in Rgui).  If this is
not the cause, I'd be happy to learn more about your issues (please
send a message to the aroma.affymetrix mailing list).

FYI, I first started to develop aroma on Windows XP 32-bit w/ 1.5GB
RAM.  Now I'm on Windows 7 64-bit w/ 8GB RAM, but the design strategy
is still to support machines with very little RAM (~500MB) as well as
those with lots of RAM (e.g. 128GB).  There are settings for
specifying how much memory to occupy.

[more below]

>
> Thanks,
>
> Ying
>
>> From: hb at biostat.ucsf.edu
>> Date: Tue, 21 Feb 2012 15:48:04 -0800
>> To: Ying.Chen at imclone.com
>> CC: saif.urrehman at icr.ac.uk; bioconductor at r-project.org
>> Subject: Re: [BioC] Reading 5000 celfiles with ReadAffy
>
>>
>> I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such
>> CEL files on regular machine (say >2GB of RAM). The bet is open to
>> the first 3 persons who challenge me (email me). I would be happy to
>> raise the number to 100,000 CEL files, but that'll be hard to find ;)

Finally, since a few people emailed me offline commenting on disk
space available on "regular" machines, the quick answer is that you'll
need ~220GB free disk space to process 20,000 HT_HG-U133A CEL files.
Here are the details: Each HT_HG-U133A CEL file is ~5.5Mb. 20,000 such
CEL files occupies ~105 GB of disk space.  When running RMA, the aroma
pipeline holds intermediate and final results on file, i.e.
quantile-normalized data (as ~5.5Mb CEL files) and chip-effect
estimates (as ~0.3Mb files).  Thus, for each HT_HG-U133A array
processed, one needs ~11.5Mb of disk space. (If one is willing to
delete the raw data one can actually get by with ~5.8Mb per array).
Thus, to do RMA on 20,000 HT_HG-U133A CEL files, you'll need ~220GB of
disk space.  I consider that fairly "regular" in today's standards.
About RAM: you'll most likely will be able to get by with as little as
500MB of RAM.

Here is what it looks like to estimate RMA chip effects (given that
you've setup the correct aroma directory structure):

# Run RMA
ces <- doRMA("GSE24026", chipType="HT_HG-U133A");

For further question about aroma.affymetrix, please head over to
http://aroma-project.org/forum/.

/Henrik

>>
>> /Henrik
>> (author of aroma.affymetrix)
>>
>> On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <Ying.Chen at imclone.com> wrote:
>> > Hi,
>> >
>> > You can try aroma.affymetrix, which is not a Bioconductor package yet.
>> > Or you can try the stand alone application RMAexpress as someone said he did
>> > a RMA on more than 10,000 cels with it.
>> >
>> > Ying
>> >
>> > -----Original Message-----
>> > From: bioconductor-bounces at r-project.org
>> > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Saif Ur-Rehman
>> > [guest]
>> > Sent: Monday, February 20, 2012 12:09 PM
>> > To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk
>> > Subject: [BioC] Reading 5000 celfiles with ReadAffy
>> >
>> >
>> > Hi,
>> >
>> > A student in my institute is trying to normalise >5000 celfiles
>> > generated on the U133A platform using the affy BioConductor library.
>> >
>> >
>> >
>> > Attempting to read in this many files results in an error in allocating
>> > the matrix which is as follows.
>> >
>> > allocMatrix: too many elements specified.
>> >
>> > As there is plenty of memory allocated to R this was surprising.
>> >
>> > Some Googling showed that there is a hard limit of +2,147,483,647 in the
>> > no. of columns in a matrix  specified by C  which leads to this error.
>> >
>> > I was just writing to ask if anyone had experience with normalisation of
>> > a large no. of celfiles and had encountered this problem and if so what if
>> > any solution you found?
>> >
>> > Thank you in advance.
>> >
>> > Sincerely,
>> > Saif Ur-Rehman
>> >
>> >  -- output of sessionInfo():
>> >
>> > NA
>> >
>> > --
>> > Sent via the guest posting facility at bioconductor.org.
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > Confidentiality Note:\ This e-mail, and any attachment
>> > t...{{dropped:11}}
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list