[R] How to speed up list access in R?
Olivier Crouzet
olivier.crouzet at univ-nantes.fr
Thu Oct 30 16:48:41 CET 2014
Hi,
perhaps pre-generating the list before processing would speed it up
significantly. Though it may still be slower than python.
e.g. try something like:
d = as.list(rep(NA,length(numbers)))
rather than:
d = list()
Olivier.
On Thu, 30 Oct
2014 11:17:59 -0400 Thomas Nyberg <tomnyberg at gmail.com> wrote:
> Hello,
>
> I want to do the following: Given a set of (number, value) pairs, I
> want to create a list l so that l[[toString(number)]] returns the
> vector of values associated to that number. It is hundreds of times
> slower than the equivalent that I would write in python. I'm pretty
> new to R so I bet I'm using its data structures inefficiently, but
> I've tried more or less everything I can think of and can't really
> speed it up. I have done some profiling which helped me find problem
> areas, but I couldn't speed things up even with that information. I'm
> thinking I'm just fundamentally using R in a silly way.
>
> I've included code for the different versions. I wrote the python
> code in a style to make it as clear to R programmers as possible.
> Thanks a lot! Any help would be greatly appreciated!
>
> Cheers,
> Thomas
>
>
> R code (with two versions depending on commenting):
>
> -----
>
> numbers <- numeric(0)
> for (i in 1:5) {
> numbers <- c(numbers, sample(1:30000, 10000))
> }
>
> values <- numeric(0)
> for (i in 1:length(numbers)) {
> values <- append(values, sample(1:10, 1))
> }
>
> starttime <- Sys.time()
>
> d = list()
> for (i in 1:length(numbers)) {
> number = toString(numbers[i])
> value = values[i]
> if (is.null(d[[number]])) {
> #if (number %in% names(d)) {
> d[[number]] <- c(value)
> } else {
> d[[number]] <- append(d[[number]], value)
> }
> }
>
> endtime <- Sys.time()
>
> print(format(endtime - starttime))
>
> -----
>
> uncommented version: "45.64791 secs"
> commented version: "1.423056 mins"
>
>
>
> Another version of R code:
>
> -----
>
> numbers <- numeric(0)
> for (i in 1:5) {
> numbers <- c(numbers, sample(1:30000, 10000))
> }
>
> values <- numeric(0)
> for (i in 1:length(numbers)) {
> values <- append(values, sample(1:10, 1))
> }
>
> starttime <- Sys.time()
>
> d = list()
> for (number in unique(numbers)) {
> d[[toString(number)]] <- numeric(0)
> }
> for (i in 1:length(numbers)) {
> number = toString(numbers[i])
> value = values[i]
> d[[number]] <- append(d[[number]], value)
> }
>
> endtime <- Sys.time()
>
> print(format(endtime - starttime))
>
> -----
>
> "47.15579 secs"
>
>
>
> The python code:
>
> -----
>
> import random
> import time
>
> numbers = []
> for i in range(5):
> numbers += random.sample(range(30000), 10000)
>
> values = []
> for i in range(len(numbers)):
> values.append(random.randint(1, 10))
>
> starttime = time.time()
>
> d = {}
> for i in range(len(numbers)):
> number = numbers[i]
> value = values[i]
> if d.has_key(number):
> d[number].append(value)
> else:
> d[number] = [value]
>
> endtime = time.time()
>
> print endtime - starttime, "seconds"
>
> -----
>
> 0.123021125793 seconds
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
--
Olivier Crouzet, PhD
Laboratoire de Linguistique -- EA3827
Université de Nantes
Chemin de la Censive du Tertre - BP 81227
44312 Nantes cedex 3
France
phone: (+33) 02 40 14 14 05 (lab.)
(+33) 02 40 14 14 36 (office)
fax: (+33) 02 40 14 13 27
e-mail: olivier.crouzet at univ-nantes.fr
http://www.lling.univ-nantes.fr/
More information about the R-help
mailing list