[R] 'object.size' takes a long time to return a value
Martin Maechler
maechler at stat.math.ethz.ch
Mon Dec 13 12:20:48 CET 2004
>>>>> "james" == james holtman <james.holtman at convergys.com>
>>>>> on Sun, 12 Dec 2004 17:03:31 -0500 writes:
james> I was using 'object.size' to see how much memory a
james> list was taking up. After executing the command, I
james> had thought that my computer had locked up. After
james> further testing, I determined that it was taking 241
james> seconds for object.size to return a value.
james> I did notice in the release notes that 'object.size'
james> did take longer when the list contained character
james> vectors. Is the time that it is taking 'object.size'
james> to return a value to be expected for such a list?
yes, partly its expected to take longer than for others,
but, actually, it does take longer than I would have expected,
even after starting to think about it:
Every element of your character vector is a string which is
coded ``as a vector of bytes with a string terminator''
(simplification). To find a string length, i.e., what the R
function nchar() also does, "one" has to read all character up
to the string terminator. That's much slower than just
using the hard coded fact that an integer is 4 bytes or a double
is 8.
james> Much better results were obtained when the character
james> vectors were converted to factors.
yes; since your factor only had a dozen or at most 175 levels;
and only these are character; the factor *data* are integers.
However, what I say above does not explain everything about
the slowness of object.size( <character> ).
We would have to go into the C code and the exact implementation
of object.size() to see the reason - and think about possible
improvements.
BTW: Note that R saves memory when character elements are
"shared"; e.g., for me (on 64-bit Linux, 2.0.1patched),
> object.size(rep("abcedfghijklmn", 3))
[1] 152
> object.size(c("abcedfghijklmn", "ABCEDFGHIJKLMN", "ABCEDFGHijklmn"))
[1] 296
Here is some code to experiment further
which slowly constructs character vectors where (I think)
no "sharing" takes place:
rChar <- function(n, m, ch.set = c(LETTERS,letters))
{
## Purpose: create random character vector
## ----------------------------------------------------------------------
## Arguments: n: length of vector
## m: "average" string length
## ----------------------------------------------------------------------
## Author: Martin Maechler, Date: 13 Dec 2004, 11:35
sapply(rpois(n, lambda=m),
function(m) paste(sample(ch.set, size=m), collapse=""))
}
lc <- rChar(1e5, 4)# already takes several seconds on a fast machine
## This is on 64-bit [AMD Athlon(tm) 64 Processor 2800+] "lynne":
system.time(print(object.size(lc)))
## [1] 7240464
## [1] 2.11 0.00 2.14 0.00 0.00
system.time(print(sum(nchar(lc)))) # which is **MUCH** faster
## [1] 399461
## [1] 0.02 0.00 0.02 0.00 0.00
## but still quite slower
system.time(print(for(i in 1:10)sn <- sum(nchar(lc))))## 0.10
## than
lx <- rnorm(1e5)
system.time(print(for(i in 1:10)os <- object.size(lx)))## 0.01
##------------
Note that if we continue this topic, it should probably be moved
to R-devel, since it's getting technical and about R internals
(in coded in C).
--
Martin Maechler, ETH Zurich
More information about the R-help
mailing list