[R] How to speed up list access in R?
William Dunlap
wdunlap at tibco.com
Thu Oct 30 18:05:06 CET 2014
You can try using an environment instead of a list.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Oct 30, 2014 at 10:02 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
> Thanks to all for the help everyone! For the moment I'll stick with Bill's
> solution, but I'll check out the other recommendations as well.
>
> Regarding the issue of slow looks ups for lists, are there any hash map
> implementations in R that are faster? I like using fairly simple logic and
> data structures when prototyping and then only optimize code when and where
> it's necessary which is why I'm curious about these basic objects.
>
> On another note, is there a vector style implementation that changes the
> vectors in place? If I'm not mistaken, the append operation creates and
> returns a new vector each time which is line with the functional nature of
> R. If there were some way to have it mutable, it could be much faster. This
> is fairly standard in many languages. Behind the scenes memory is allocated
> at say 2 times the current size so that you only need log(n) extensions when
> building up a vector like this. Are there any such equivalents in R? I
> presume that lists are mutable (am I wrong?), but they seem to have the
> lookup slowdown problem.
>
> Again thanks a lot!
>
> Cheers,
> Thomas
>
>
> On 10/30/2014 12:05 PM, William Dunlap wrote:
>>
>> Repeatedly extending vectors takes a lot of time. You can do what you
>> want with
>> d2 <- split(values, factor(numbers, levels=unique(numbers)))
>> If you would like the labels on d2 to be in numeric order then you can
>> simplify that to
>> d3 <- split(values, numbers)
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyberg at gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I want to do the following: Given a set of (number, value) pairs, I want
>>> to
>>> create a list l so that l[[toString(number)]] returns the vector of
>>> values
>>> associated to that number. It is hundreds of times slower than the
>>> equivalent that I would write in python. I'm pretty new to R so I bet I'm
>>> using its data structures inefficiently, but I've tried more or less
>>> everything I can think of and can't really speed it up. I have done some
>>> profiling which helped me find problem areas, but I couldn't speed things
>>> up
>>> even with that information. I'm thinking I'm just fundamentally using R
>>> in a
>>> silly way.
>>>
>>> I've included code for the different versions. I wrote the python code in
>>> a
>>> style to make it as clear to R programmers as possible. Thanks a lot! Any
>>> help would be greatly appreciated!
>>>
>>> Cheers,
>>> Thomas
>>>
>>>
>>> R code (with two versions depending on commenting):
>>>
>>> -----
>>>
>>> numbers <- numeric(0)
>>> for (i in 1:5) {
>>> numbers <- c(numbers, sample(1:30000, 10000))
>>> }
>>>
>>> values <- numeric(0)
>>> for (i in 1:length(numbers)) {
>>> values <- append(values, sample(1:10, 1))
>>> }
>>>
>>> starttime <- Sys.time()
>>>
>>> d = list()
>>> for (i in 1:length(numbers)) {
>>> number = toString(numbers[i])
>>> value = values[i]
>>> if (is.null(d[[number]])) {
>>> #if (number %in% names(d)) {
>>> d[[number]] <- c(value)
>>> } else {
>>> d[[number]] <- append(d[[number]], value)
>>> }
>>> }
>>>
>>> endtime <- Sys.time()
>>>
>>> print(format(endtime - starttime))
>>>
>>> -----
>>>
>>> uncommented version: "45.64791 secs"
>>> commented version: "1.423056 mins"
>>>
>>>
>>>
>>> Another version of R code:
>>>
>>> -----
>>>
>>> numbers <- numeric(0)
>>> for (i in 1:5) {
>>> numbers <- c(numbers, sample(1:30000, 10000))
>>> }
>>>
>>> values <- numeric(0)
>>> for (i in 1:length(numbers)) {
>>> values <- append(values, sample(1:10, 1))
>>> }
>>>
>>> starttime <- Sys.time()
>>>
>>> d = list()
>>> for (number in unique(numbers)) {
>>> d[[toString(number)]] <- numeric(0)
>>> }
>>> for (i in 1:length(numbers)) {
>>> number = toString(numbers[i])
>>> value = values[i]
>>> d[[number]] <- append(d[[number]], value)
>>> }
>>>
>>> endtime <- Sys.time()
>>>
>>> print(format(endtime - starttime))
>>>
>>> -----
>>>
>>> "47.15579 secs"
>>>
>>>
>>>
>>> The python code:
>>>
>>> -----
>>>
>>> import random
>>> import time
>>>
>>> numbers = []
>>> for i in range(5):
>>> numbers += random.sample(range(30000), 10000)
>>>
>>> values = []
>>> for i in range(len(numbers)):
>>> values.append(random.randint(1, 10))
>>>
>>> starttime = time.time()
>>>
>>> d = {}
>>> for i in range(len(numbers)):
>>> number = numbers[i]
>>> value = values[i]
>>> if d.has_key(number):
>>> d[number].append(value)
>>> else:
>>> d[number] = [value]
>>>
>>> endtime = time.time()
>>>
>>> print endtime - starttime, "seconds"
>>>
>>> -----
>>>
>>> 0.123021125793 seconds
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list