[R] How to speed up list access in R?

Thu Oct 30 16:48:41 CET 2014

Hi,

perhaps pre-generating the list before processing would speed it up
significantly. Though it may still be slower than python.

e.g. try something like:

d = as.list(rep(NA,length(numbers)))

rather than:

d = list()

Olivier.

On Thu, 30 Oct
2014 11:17:59 -0400 Thomas Nyberg <tomnyberg at gmail.com> wrote:

> Hello,
> 
> I want to do the following: Given a set of (number, value) pairs, I
> want to create a list l so that l[[toString(number)]] returns the
> vector of values associated to that number. It is hundreds of times
> slower than the equivalent that I would write in python. I'm pretty
> new to R so I bet I'm using its data structures inefficiently, but
> I've tried more or less everything I can think of and can't really
> speed it up. I have done some profiling which helped me find problem
> areas, but I couldn't speed things up even with that information. I'm
> thinking I'm just fundamentally using R in a silly way.
> 
> I've included code for the different versions. I wrote the python
> code in a style to make it as clear to R programmers as possible.
> Thanks a lot! Any help would be greatly appreciated!
> 
> Cheers,
> Thomas
> 
> 
> R code (with two versions depending on commenting):
> 
> -----
> 
> numbers <- numeric(0)
> for (i in 1:5) {
>      numbers <- c(numbers, sample(1:30000, 10000))
> }
> 
> values <- numeric(0)
> for (i in 1:length(numbers)) {
>      values <- append(values, sample(1:10, 1))
> }
> 
>             starttime <- Sys.time()
> 
> d = list()
> for (i in 1:length(numbers)) {
>      number = toString(numbers[i])
>      value = values[i]
>      if (is.null(d[[number]])) {
>      #if (number %in% names(d)) {
>          d[[number]] <- c(value)
>      } else {
>          d[[number]] <- append(d[[number]], value)
>      }
> }
> 
> endtime <- Sys.time()
> 
> print(format(endtime - starttime))
> 
> -----
> 
> uncommented version: "45.64791 secs"
> commented version: "1.423056 mins"
> 
> 
> 
> Another version of R code:
> 
> -----
> 
> numbers <- numeric(0)
> for (i in 1:5) {
>      numbers <- c(numbers, sample(1:30000, 10000))
> }
> 
> values <- numeric(0)
> for (i in 1:length(numbers)) {
>      values <- append(values, sample(1:10, 1))
> }
> 
> starttime <- Sys.time()
> 
> d = list()
> for (number in unique(numbers)) {
>      d[[toString(number)]] <- numeric(0)
> }
> for (i in 1:length(numbers)) {
>      number = toString(numbers[i])
>      value = values[i]
>      d[[number]] <- append(d[[number]], value)
> }
> 
> endtime <- Sys.time()
> 
> print(format(endtime - starttime))
> 
> -----
> 
> "47.15579 secs"
> 
> 
> 
> The python code:
> 
> -----
> 
> import random
> import time
> 
> numbers = []
> for i in range(5):
>      numbers += random.sample(range(30000), 10000)
> 
> values = []
> for i in range(len(numbers)):
>      values.append(random.randint(1, 10))
> 
> starttime = time.time()
> 
> d = {}
> for i in range(len(numbers)):
>      number = numbers[i]
>      value = values[i]
>      if d.has_key(number):
>          d[number].append(value)
>      else:
>          d[number] = [value]
> 
> endtime = time.time()
> 
> print endtime - starttime, "seconds"
> 
> -----
> 
> 0.123021125793 seconds
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

-- 
  Olivier Crouzet, PhD
  Laboratoire de Linguistique -- EA3827
  Université de Nantes
  Chemin de la Censive du Tertre - BP 81227
  44312 Nantes cedex 3
  France

     phone:        (+33) 02 40 14 14 05 (lab.)
                   (+33) 02 40 14 14 36 (office)
     fax:          (+33) 02 40 14 13 27
     e-mail:       olivier.crouzet at univ-nantes.fr

  http://www.lling.univ-nantes.fr/