[R] How to speed up list access in R?

Ista Zahn istazahn at gmail.com
Thu Oct 30 17:34:06 CET 2014


Bill beat me to it, I was just about to post the same thing. The R
split version is still slower than python on my system, but the times
are now on the same order of magnitude, about a 10th of a second in
both cases.

You can also speed up the set-up part by sampling all at once instead
of repeatedly, e.g.,

sample(1:10, length(numbers2), replace=TRUE)

instead of

values <- numeric(0)
for (i in 1:length(numbers)) {
    values <- append(values, sample(1:10, 1))
}

Best,
Ista
On Thu, Oct 30, 2014 at 12:05 PM, William Dunlap <wdunlap at tibco.com> wrote:
> Repeatedly extending vectors takes a lot of time.  You can do what you want with
>   d2 <- split(values, factor(numbers, levels=unique(numbers)))
> If you would like the labels on d2 to be in numeric order then you can
> simplify that to
>   d3 <- split(values, numbers)
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
>> Hello,
>>
>> I want to do the following: Given a set of (number, value) pairs, I want to
>> create a list l so that l[[toString(number)]] returns the vector of values
>> associated to that number. It is hundreds of times slower than the
>> equivalent that I would write in python. I'm pretty new to R so I bet I'm
>> using its data structures inefficiently, but I've tried more or less
>> everything I can think of and can't really speed it up. I have done some
>> profiling which helped me find problem areas, but I couldn't speed things up
>> even with that information. I'm thinking I'm just fundamentally using R in a
>> silly way.
>>
>> I've included code for the different versions. I wrote the python code in a
>> style to make it as clear to R programmers as possible. Thanks a lot! Any
>> help would be greatly appreciated!
>>
>> Cheers,
>> Thomas
>>
>>
>> R code (with two versions depending on commenting):
>>
>> -----
>>
>> numbers <- numeric(0)
>> for (i in 1:5) {
>>     numbers <- c(numbers, sample(1:30000, 10000))
>> }
>>
>> values <- numeric(0)
>> for (i in 1:length(numbers)) {
>>     values <- append(values, sample(1:10, 1))
>> }
>>
>>            starttime <- Sys.time()
>>
>> d = list()
>> for (i in 1:length(numbers)) {
>>     number = toString(numbers[i])
>>     value = values[i]
>>     if (is.null(d[[number]])) {
>>     #if (number %in% names(d)) {
>>         d[[number]] <- c(value)
>>     } else {
>>         d[[number]] <- append(d[[number]], value)
>>     }
>> }
>>
>> endtime <- Sys.time()
>>
>> print(format(endtime - starttime))
>>
>> -----
>>
>> uncommented version: "45.64791 secs"
>> commented version: "1.423056 mins"
>>
>>
>>
>> Another version of R code:
>>
>> -----
>>
>> numbers <- numeric(0)
>> for (i in 1:5) {
>>     numbers <- c(numbers, sample(1:30000, 10000))
>> }
>>
>> values <- numeric(0)
>> for (i in 1:length(numbers)) {
>>     values <- append(values, sample(1:10, 1))
>> }
>>
>> starttime <- Sys.time()
>>
>> d = list()
>> for (number in unique(numbers)) {
>>     d[[toString(number)]] <- numeric(0)
>> }
>> for (i in 1:length(numbers)) {
>>     number = toString(numbers[i])
>>     value = values[i]
>>     d[[number]] <- append(d[[number]], value)
>> }
>>
>> endtime <- Sys.time()
>>
>> print(format(endtime - starttime))
>>
>> -----
>>
>> "47.15579 secs"
>>
>>
>>
>> The python code:
>>
>> -----
>>
>> import random
>> import time
>>
>> numbers = []
>> for i in range(5):
>>     numbers += random.sample(range(30000), 10000)
>>
>> values = []
>> for i in range(len(numbers)):
>>     values.append(random.randint(1, 10))
>>
>> starttime = time.time()
>>
>> d = {}
>> for i in range(len(numbers)):
>>     number = numbers[i]
>>     value = values[i]
>>     if d.has_key(number):
>>         d[number].append(value)
>>     else:
>>         d[number] = [value]
>>
>> endtime = time.time()
>>
>> print endtime - starttime, "seconds"
>>
>> -----
>>
>> 0.123021125793 seconds
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list