[R] size limits
Mark Lamias
mlamias at isr.umich.edu
Sun Jan 23 20:45:42 CET 2000
See message below.
-----Original Message-----
From: Jeff Miller [mailto:jdm at xnet.com]
Sent: Sunday, January 23, 2000 12:44 PM
To: r-help at stat.math.ethz.ch
Subject: [R] size limits
Hi,
I have a few questions about how to handle large data sets in R.
What is the size of the largest matrix that R can comfortably deal with?
Is this size limit imposed by R's software, or is it a question
of the machine that one runs on?
How does one go about choosing reasonable values of vsize
and nsize?
I have a data set with about 1,000,000 rows, and 30 columns (1
character, 29
numbers),
stored in a flat file.
When I run Splus-5 on a Solaris workstation I can read this file quite
easily
myData <- read.table(file = "mydata.dat")
and manipulate the data without any problems.
On the other hand, when I try to do the same on a PC (128 M RAM,
400MHz), running
Linux (Redhat 6.1) , on R version 0.90.0, I find that it is
impossible.
When I allocate (what I believe to be) the maximum amount of vsize
memory and a large amount of nsize memory
R --vsize 200M --nsize 4000k,
and then try to read the file in using read.table() or scan()
myData <- read.table(file = "mydata.dat")
or
myData <- scan(file = "myData.dat", what = list("",0,0,...,0))
(with
29 zeros)
I get kicked out of R.
More worrisome, I did succeed in reading in a subset of the data with
30,000 rows.
However, when I tried to plot one of the columns, my monitor began
blinking
wildly, and the machine crashed. I had to reboot.
I tried to read the R help page on memory, but wasn't able to understand
much of
what it was saying.
Thanks much for any help,
Jeff Miller
NEW MESSAGE:
The following information (contained in the R documentation, which can be
found by typing help(Memory)) may help you with your problem. Sounds to me
like you have a memory problem.
--Mark Lamias
Mark J. Lamias
Department of Statistics
Department of Political Science
Survey Methodology Program/Survey Research Center
Institute for Social Research - University of Michigan
426 Thompson Street, Room 315
Ann Arbor, Michigan 48104-2321
(734) 647-5381
Memory(base) R Documentation
Memory Available for Data Storage
Description:
Use command line options to set the memory available
for R.
Usage:
R --vsize v --nsize n
Arguments:
v: Use `v' bytes of heap memory
n: Use `n' cons cells.
Details:
R (currently) uses a static memory model. This means
that when it starts up, it asks the operating system to
reserve a fixed amount of memory for it. The size of
this chunk cannot be changed subsequently. Hence, it
can happen that not enough memory was allocated, e.g.,
when trying to read large data sets into R.
In these cases, you should restart R (after saving your
current workspace) with more memory available, using
the command line options `--nsize' and `--vsize'. To
understand these options, one needs to know that R
maintains separate areas for fixed and variable sized
objects. The first of these is allocated as an array
of ``cons cells'' (Lisp programmers will know what they
are, others may think of them as the building blocks of
the language itself, parse trees, etc.), and the second
are thrown on a ``heap'' of ``Vcells'' (see
`gc()["Vcells","total"]') of 8 bytes each. Effec-
tively, the input `v' is therefore truncated to the
nearest multiple of 8.
The `--nsize' option can be used to specify the number
of cons cells (each occupying 16 bytes) which R is to
use (the default is 250000), and the `--vsize' option
to specify the size of the vector heap in bytes (the
default is 6 MB). Both options must be integers or
integers ending with `M', `K', or `k' meaning Mega (=
2^{20} = 1048576), (computer) Kilo (= 2^{10} = 1024),
or regular kilo (= 1000). (The minimum allowed values
are 200000 and 2M.)
E.g., to read in a table of 10000 observations on 40
numeric variables, `R --vsize 10M' should do; For
`source()'ing a large file, you'd use `R --nsize 500k'.
Note that the information on where to find vectors and
strings on the heap is stored using cons cells. Thus,
it may also be necessary to allocate more space for
cons cells in order to perform computations with very
``large'' variable-size objects.
You can find out the current memory consumption (the
proportion of heap and cons cells used) by typing
`gc()' at the R prompt. This may help you in finding
out whether to increase `--vsize' or `--nsize'. Note
that following `gcinfo(TRUE)', automatic garbage col-
lection always prints memory use statistics.
R will tell you whether you ran out of cons or heap
memory.
The defaults for `--nsize' and `--vsize' can be changed
by setting the environment variables `R_NSIZE' and
`R_VSIZE' respectively, perhaps most conveniently on
Unix in the file `~/.Renviron'.
When using `read.table', the memory requirements are in
fact higher than anticipated, because the file is first
read in as one long string which is then split again.
Use `scan' if possible in case you run out of memory
when reading in a large table.
See Also:
`gc' for information on the garbage collector.
Examples:
# Start R with 15MB of heap memory and 1 million cons cells
R --vsize 15M --nsize 1000k
Mark J. Lamias
Department of Statistics
Department of Political Science
Survey Methodology Program/Survey Research Center
Institute for Social Research - University of Michigan
426 Thompson Street, Room 315
Ann Arbor, Michigan 48104-2321
(734) 647-5381
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list