# [R] complexity of operations in R

Jan van der Laan rhelp at eoos.dds.nl
Thu Jul 19 22:00:17 CEST 2012

```When the length of the end result is not known, doubling the length of
the list is also much faster than increasing the size of the list with
single items.

f <- function(n, preallocate) {
v <- if(preallocate) vector("list",n) else list() ;
for(i in seq_len(n)) {
v[[i]] <- i
}
v
}

g <- function(n) {
N <- 1000
v <- vector("list", N)
for(i in seq_len(n)) {
if (i > N) {
N <- 2 * N
length(v) <- N
}
v[[i]] <- i
}
v[1:i]
}

> system.time(f(5E4, TRUE))
user  system elapsed
0.968   0.000   0.975
> system.time(f(5E4, FALSE))
user  system elapsed
52.611   0.136  54.197
> system.time(g(5E4))
user  system elapsed
1.388   0.008   1.424

What causes these differences? I can imagine that the time needed for
memory allocations play a role: multiple small allocations will be
smaller than one large allocation. But that doesn't explain the
quadratic growth in time. I would expect that to be linear. When doing
v[[i]] <- i the list isn't copied, right?

Jan

On 07/19/2012 06:21 PM, William Dunlap wrote:
> Preallocation of lists does speed things up.  The following shows
> time quadratic in size when there is no preallocation and linear
> growth when there is, for size in the c. 10^4 to 10^6 region:
>> f <- function(n, preallocate) { v <- if(preallocate)vector("list",n) else list() ; for(i in seq_len(n)) v[[i]] <- i ; v }
>> identical(f(17,pre=TRUE), f(17,pre=FALSE))
>  TRUE
>> system.time(f(n=1e4, preallocate=FALSE))
>     user  system elapsed
>    0.324   0.000   0.326
>> system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time
>     user  system elapsed
>    1.316   0.012   1.329
>> system.time(f(n=4e4, preallocate=FALSE)) # ditto
>     user  system elapsed
>    5.720   0.028   5.754
>>
>> system.time(f(n=1e4, preallocate=TRUE))
>     user  system elapsed
>    0.016   0.000   0.017
>> system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time
>     user  system elapsed
>    0.032   0.004   0.036
>> system.time(f(n=4e4, preallocate=TRUE)) # ditto
>     user  system elapsed
>    0.068   0.000   0.069
>>
>> system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time
>     user  system elapsed
>    0.688   0.000   0.688
>
> Above 10^6 there is some superlinearity
>> system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time
>     user  system elapsed
>   11.125   0.052  11.181
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Bert Gunter
>> Sent: Thursday, July 19, 2012 9:11 AM
>> Cc: r-help at r-project.org
>> Subject: Re: [R] complexity of operations in R
>>
>>
>> Indeed. And using a loop is a poor way to do it anyway.
>>
>> v <- as.list(rep(FALSE,dotot))
>>
>> is way faster.
>>
>> -- Bert
>>
>> On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham <hadley at rice.edu> wrote:
>>> On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
>>>> Johan,
>>>>
>>>> Your 'list' and 'array doubling' code can be written much more efficient.
>>>>
>>>> The following function is faster than your g and easier to read:
>>>>
>>>> g2 <- function(dotot) {
>>>>    v <- list()
>>>>    for (i in seq_len(dotot)) {
>>>>      v[[i]] <- FALSE
>>>>    }
>>>> }
>>>
>>> Except that you don't need to pre-allocate lists...
>>>
>>>
>>> --
>>> Assistant Professor / Dobelman Family Junior Chair
>>> Department of Statistics / Rice University
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> biostatistics/pdb-ncb-home.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help