[R] How to speed up list access in R?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Oct 30 17:00:14 CET 2014


Look at sqldf or data.table packages. Lists are slow for lookup and not particularly efficient with memory. numeric indexing into matrices or data frames is more typical in R, and the above mentioned packages support indexing to speed up lookups. Also, carefully consider whether you can program your processing in bulk... vector or relational processing can be critical for performance.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 30, 2014 8:17:59 AM PDT, Thomas Nyberg <tomnyberg at gmail.com> wrote:
>Hello,
>
>I want to do the following: Given a set of (number, value) pairs, I
>want 
>to create a list l so that l[[toString(number)]] returns the vector of 
>values associated to that number. It is hundreds of times slower than 
>the equivalent that I would write in python. I'm pretty new to R so I 
>bet I'm using its data structures inefficiently, but I've tried more or
>
>less everything I can think of and can't really speed it up. I have
>done 
>some profiling which helped me find problem areas, but I couldn't speed
>
>things up even with that information. I'm thinking I'm just 
>fundamentally using R in a silly way.
>
>I've included code for the different versions. I wrote the python code 
>in a style to make it as clear to R programmers as possible. Thanks a 
>lot! Any help would be greatly appreciated!
>
>Cheers,
>Thomas
>
>
>R code (with two versions depending on commenting):
>
>-----
>
>numbers <- numeric(0)
>for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
>}
>
>values <- numeric(0)
>for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
>}
>
>            starttime <- Sys.time()
>
>d = list()
>for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     if (is.null(d[[number]])) {
>     #if (number %in% names(d)) {
>         d[[number]] <- c(value)
>     } else {
>         d[[number]] <- append(d[[number]], value)
>     }
>}
>
>endtime <- Sys.time()
>
>print(format(endtime - starttime))
>
>-----
>
>uncommented version: "45.64791 secs"
>commented version: "1.423056 mins"
>
>
>
>Another version of R code:
>
>-----
>
>numbers <- numeric(0)
>for (i in 1:5) {
>     numbers <- c(numbers, sample(1:30000, 10000))
>}
>
>values <- numeric(0)
>for (i in 1:length(numbers)) {
>     values <- append(values, sample(1:10, 1))
>}
>
>starttime <- Sys.time()
>
>d = list()
>for (number in unique(numbers)) {
>     d[[toString(number)]] <- numeric(0)
>}
>for (i in 1:length(numbers)) {
>     number = toString(numbers[i])
>     value = values[i]
>     d[[number]] <- append(d[[number]], value)
>}
>
>endtime <- Sys.time()
>
>print(format(endtime - starttime))
>
>-----
>
>"47.15579 secs"
>
>
>
>The python code:
>
>-----
>
>import random
>import time
>
>numbers = []
>for i in range(5):
>     numbers += random.sample(range(30000), 10000)
>
>values = []
>for i in range(len(numbers)):
>     values.append(random.randint(1, 10))
>
>starttime = time.time()
>
>d = {}
>for i in range(len(numbers)):
>     number = numbers[i]
>     value = values[i]
>     if d.has_key(number):
>         d[number].append(value)
>     else:
>         d[number] = [value]
>
>endtime = time.time()
>
>print endtime - starttime, "seconds"
>
>-----
>
>0.123021125793 seconds
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list