[R] complexity of operations in R

Thu Jul 19 19:06:58 CEST 2012

On Thu, Jul 19, 2012 at 11:11 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> Hadley et. al:
>
> Indeed. And using a loop is a poor way to do it anyway.
>
> v <- as.list(rep(FALSE,dotot))
>
> is way faster.
>
> -- Bert
>

Its not entirely clear to me what we are supposed to conclude about this.

I can confirm Bert's claim that his way is much  faster.  I ran
Johan's code, then Bert's suggestion.  The speedup is almost
unbelievable.

> system.time(f(dotot))

   user  system elapsed
 25.249   0.332  25.606
> #system.time(g(dotot))
> system.time(h(dotot))
   user  system elapsed
 15.753   0.404  16.173
> p <- function(dotot){v <- as.list(rep(FALSE,dotot))}
> system.time(p(dotot))
   user  system elapsed
  0.004   0.000   0.004

Why is this so much faster?

> f
function(dotot){
  v<-matrix(ncol=1,nrow=0)
  for(i in 1:dotot){
    v<-rbind(v,FALSE)
  }
  return(v)
}

Obvious problems.

1. The return() at the end is unnecessary. Cutting that off saves 4%
or so on time.

>
> f2 <- function(dotot){
+   v<-matrix(ncol=1,nrow=0)
+   for(i in 1:dotot){
+     v<-rbind(v,FALSE)
+   }
+   v
+ }
> system.time(f2(dotot))
   user  system elapsed
 24.150   0.220  24.391

2. Use of rbind is SLOW.  Calling rbind over and over again is VERY
slow (http://pj.freefaculty.org/R/WorkingExamples/stackListItems.Rout).

3. Calling matrix over and over to create nothing is SLOW.

If we just want a giant empty list, do this:

mylst <- vector(mode="list", length=dotot)

> system.time(mylst <- vector(mode="list", length=dotot))
   user  system elapsed
  0.000   0.000   0.001

In a practical application, where something has to be created to go
into the list, I think some speed considerations should be:

1. When a 'thing' is created and put in the list, do we need to
allocate memory twice like this

x <- do.whatever()

mylst[[999]] <- x

This will create x, and then make a duplicate of x to go into mylst. BORING!

if x is a big thing, I think better to avoid copying by making it anonymous:

mylst[[999]] <- do.whatever()

If memory is a consideration at all, this is better, and it avoids the
problem of allocating memory twice.

Anyway, it seems to me the point of this thread should be that
allocating a big giant list of nothings is arbitrarily fast, but it is
still not known

1.  what is the efficient way to create things to go into the list,

2. what is the best way to make use of the results once they are collected.

I'm sure for looping and calling rbind lots of times is a really bad
idea.  Everything else is on the table for discussion. Its not the for
loop that makes the original f function slow, its repeated use of
matrix and rbind.

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu