[R] How to speed up list access in R?

Thu Oct 30 17:05:22 CET 2014

Repeatedly extending vectors takes a lot of time.  You can do what you want with
  d2 <- split(values, factor(numbers, levels=unique(numbers)))
If you would like the labels on d2 to be in numeric order then you can
simplify that to
  d3 <- split(values, numbers)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
> Hello,
>
> I want to do the following: Given a set of (number, value) pairs, I want to
> create a list l so that l[[toString(number)]] returns the vector of values
> associated to that number. It is hundreds of times slower than the
> equivalent that I would write in python. I'm pretty new to R so I bet I'm
> using its data structures inefficiently, but I've tried more or less
> everything I can think of and can't really speed it up. I have done some
> profiling which helped me find problem areas, but I couldn't speed things up
> even with that information. I'm thinking I'm just fundamentally using R in a
> silly way.
>
> I've included code for the different versions. I wrote the python code in a
> style to make it as clear to R programmers as possible. Thanks a lot! Any
> help would be greatly appreciated!
>
> Cheers,
> Thomas
>
>
> R code (with two versions depending on commenting):
>
> -----
>
> numbers <- numeric(0)
> for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
> }
>
> values <- numeric(0)
> for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
> }
>
>            starttime <- Sys.time()
>
> d = list()
> for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     if (is.null(d[[number]])) {
>     #if (number %in% names(d)) {
>         d[[number]] <- c(value)
>     } else {
>         d[[number]] <- append(d[[number]], value)
>     }
> }
>
> endtime <- Sys.time()
>
> print(format(endtime - starttime))
>
> -----
>
> uncommented version: "45.64791 secs"
> commented version: "1.423056 mins"
>
>
>
> Another version of R code:
>
> -----
>
> numbers <- numeric(0)
> for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
> }
>
> values <- numeric(0)
> for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
> }
>
> starttime <- Sys.time()
>
> d = list()
> for (number in unique(numbers)) {
>     d[[toString(number)]] <- numeric(0)
> }
> for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     d[[number]] <- append(d[[number]], value)
> }
>
> endtime <- Sys.time()
>
> print(format(endtime - starttime))
>
> -----
>
> "47.15579 secs"
>
>
>
> The python code:
>
> -----
>
> import random
> import time
>
> numbers = []
> for i in range(5):
>     numbers += random.sample(range(30000), 10000)
>
> values = []
> for i in range(len(numbers)):
>     values.append(random.randint(1, 10))
>
> starttime = time.time()
>
> d = {}
> for i in range(len(numbers)):
>     number = numbers[i]
>     value = values[i]
>     if d.has_key(number):
>         d[number].append(value)
>     else:
>         d[number] = [value]
>
> endtime = time.time()
>
> print endtime - starttime, "seconds"
>
> -----
>
> 0.123021125793 seconds
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.