[R] How to speed up list access in R?
Thomas Nyberg
tomnyberg at gmail.com
Thu Oct 30 16:17:59 CET 2014
Hello,
I want to do the following: Given a set of (number, value) pairs, I want
to create a list l so that l[[toString(number)]] returns the vector of
values associated to that number. It is hundreds of times slower than
the equivalent that I would write in python. I'm pretty new to R so I
bet I'm using its data structures inefficiently, but I've tried more or
less everything I can think of and can't really speed it up. I have done
some profiling which helped me find problem areas, but I couldn't speed
things up even with that information. I'm thinking I'm just
fundamentally using R in a silly way.
I've included code for the different versions. I wrote the python code
in a style to make it as clear to R programmers as possible. Thanks a
lot! Any help would be greatly appreciated!
Cheers,
Thomas
R code (with two versions depending on commenting):
-----
numbers <- numeric(0)
for (i in 1:5) {
numbers <- c(numbers, sample(1:30000, 10000))
}
values <- numeric(0)
for (i in 1:length(numbers)) {
values <- append(values, sample(1:10, 1))
}
starttime <- Sys.time()
d = list()
for (i in 1:length(numbers)) {
number = toString(numbers[i])
value = values[i]
if (is.null(d[[number]])) {
#if (number %in% names(d)) {
d[[number]] <- c(value)
} else {
d[[number]] <- append(d[[number]], value)
}
}
endtime <- Sys.time()
print(format(endtime - starttime))
-----
uncommented version: "45.64791 secs"
commented version: "1.423056 mins"
Another version of R code:
-----
numbers <- numeric(0)
for (i in 1:5) {
numbers <- c(numbers, sample(1:30000, 10000))
}
values <- numeric(0)
for (i in 1:length(numbers)) {
values <- append(values, sample(1:10, 1))
}
starttime <- Sys.time()
d = list()
for (number in unique(numbers)) {
d[[toString(number)]] <- numeric(0)
}
for (i in 1:length(numbers)) {
number = toString(numbers[i])
value = values[i]
d[[number]] <- append(d[[number]], value)
}
endtime <- Sys.time()
print(format(endtime - starttime))
-----
"47.15579 secs"
The python code:
-----
import random
import time
numbers = []
for i in range(5):
numbers += random.sample(range(30000), 10000)
values = []
for i in range(len(numbers)):
values.append(random.randint(1, 10))
starttime = time.time()
d = {}
for i in range(len(numbers)):
number = numbers[i]
value = values[i]
if d.has_key(number):
d[number].append(value)
else:
d[number] = [value]
endtime = time.time()
print endtime - starttime, "seconds"
-----
0.123021125793 seconds
More information about the R-help
mailing list