[BioC] large amount of slides
Marcus Davy
MDavy at hortresearch.co.nz
Tue Jun 8 23:38:36 CEST 2004
Hi,
you can use the function object.size to estimate the the storage of any
expression set objects.
e.g.
> object.size(affybatch.example)
[1] 243384
> dim(exprs(affybatch.example))
[1] 10000 3
> object.size(exprs(affybatch.example))
[1] 240280
> object.size(exprs(affybatch.example)) /
(nrow(exprs(affybatch.example))*ncol(exprs(affybatch.example)))
[1] 8.009333
Each matrix double precision value should take 8 bytes of storage, so
you can estimate
the amount of memory required for n genes by 200 arrays plus annotation
information etc.
On a *standard* windows XP (or 2000) machine running R 1.9.0 you can
increase the
addressable memory space with the --max-mem-size=2G arguement when you
run the
executable, details are in the windows FAQ. Check it has increased
with;
>memory.limit()
[1] 2147483648
Memory intensive algorithms could start running out of addressable
memory on some 32-bit
architectures for large datasets, e.g. Bioconductors siggenes sam
permutation testing function
with B=1000, on 27000 genes is likely to have problems on some 32-bit
platforms depending
on physical memory and the virtual page size available to the operating
system.
marcus
>>> "Park, Richard" <Richard.Park at joslin.harvard.edu> 5/06/2004 3:40:42
AM >>>
Hi Vada,
I would caution you on doing rma on that many datasets. I have noticed
a trend in rma, that things get even more underestimated as the number
and variance of the data increases. I have been doing an analysis on
immune cell types for about 100 cel files. My computer (windows 2000,
2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty
sure that my problem is that windows 2000 has a maximum allocation of
1gb.
But if most of your data is pretty related (i.e. same tissues, just a
ko vs wt) you should be fine w/ rma. I would caution against using rma
on data that is very different.
hth,
richard
-----Original Message-----
From: Vada Wilcox [mailto:v_wilcox at hotmail.com]
Sent: Friday, June 04, 2004 11:06 AM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] large amount of slides
Dear all,
I have been using RMA succesfully for a while now, but in the past I
have
only used it on a small amount of slides. I would like to do my study
on a
larger scale now, with data (series of experiments) from other
researchers
as well. My questions is the following: if I want to study, let's say
200
slides, do I have to read them all into R at once (so together I mean,
with
read.affy() in package affy), or is it OK to read them series by series
(so
all wild types and controls of one researcher at a time)?
If it is really necessary to read all of them in at one time how much
RAM
would I need (for let's say 200 CELfiles) and how can I raise the RAM?
I now
it's possible to raise it by using 'max vsize = ...' but I haven't been
able
to do it succesfully for 200 experiments though. Can somebody help me
on
this?
Many thanks in advance,
Vada
_________________________________________________________________
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
______________________________________________________
The contents of this e-mail are privileged and/or confidenti...{{dropped}}
More information about the Bioconductor
mailing list