[R] Peformance question

Fri Apr 21 13:20:50 CEST 2017

I dont understand your code. But I do have suggestion. Run the functions in
the profiler, maybe differences will point at the enemy.

Know what I mean?

Rprof('check.out')
#run code
Rprof(NULL)
summaryRprof('check.out')

Do that for each method. That may be uninformative.

I wondered if you tried to compile your functions? In some cases it helps
erase differences like this. Norman Matloff has examples like that in Art
of R Programming.

I keep a list of things that are slow, if we can put finger on problem, I
will add to list. I suspect slow here is in runtime object lookup. The
environment ones have info located more quickly by the runtime, I expect.
Also, passing info back and forth from the R runtime system using [ is a
common cause of slow. It is why everybody yells 'vectorize' and 'use
lapply' all the time.  Then again, I'm guessing because I dont understand
your code:)

Good luck,
PJ

On Apr 11, 2017 7:44 PM, "Thomas Mailund" <thomas.mailund at gmail.com> wrote:

Hi y’all,

I’m working on a book on how to implement functional data structures in R,
and in particular on a chapter on implementing queues. You get get the
current version here https://www.dropbox.com/s/9c2yk3a67p1ypmr/book.pdf?dl=0
and the relevant pages are 50-59. I’ve implemented three versions of the
same idea, implementing a queue using two linked lists. One list contains
the elements you add to the end of a list, the other contains the elements
at the front of the list, and when you try to get an element from a list
and the front-list is empty you move elements from the back-list to the
front. The asymptotic analysis is explained in this figure
https://www.dropbox.com/s/tzi84zmyq16hdx0/queue-
amortized-linear-bound.png?dl=0 and all my implementations do get a linear
time complexity when I evaluate them on a linear number of operations.
However, the two implementations that uses environments seem to be almost
twice as fast as the implementation that gives me a persistent data
structure (see https://www.dropbox.com/s/i9dyab9ordkm0xj/queue-
comparisons.png?dl=0), and I cannot figure out why.

The code below contains the implementation of all three versions of the
queue plus the code I use to measure their performances. I’m sorry it is a
little long, but it is a minimal implementation of all three variants, the
comments just make it look longer than it really is.

Since the three implementations are doing basically the same things, I am a
little stumped about why the performance is so consistently different.

Can anyone shed some light on this, or help me figure out how to explore
this further?

Cheers

Thomas

## Implementations of queues ##################

#' Test if a data structure is empty
#' @param x The data structure
#' @return TRUE if x is empty.
#' @export
is_empty <- function(x) UseMethod("is_empty")

#' Add an element to a queue
#' @param x A queue
#' @param elm An element
#' @return an updated queue where the element has been added
#' @export
enqueue <- function(x, elm) UseMethod("enqueue")

#' Get the front element of a queue
#' @param x A queue
#' @return the front element of the queue
#' @export
front <- function(x) UseMethod("front")

#' Remove the front element of a queue
#' @param x The queue
#' @return The updated queue
#' @export
dequeue <- function(x) UseMethod("dequeue")

## Linked lists #########################

#' Add a head item to a linked list.
#' @param elem  The item to put at the head of the list.
#' @param lst   The list -- it will become the tail of the new list.
#' @return a new linked list.
#' @export
list_cons <- function(elem, lst)
  structure(list(head = elem, tail = lst), class = "linked_list")

list_nil <- list_cons(NA, NULL)

#' @method is_empty linked_list
#' @export
is_empty.linked_list <- function(x) identical(x, list_nil)

#' Create an empty linked list.
#' @return an empty linked list.
#' @export
empty_list <- function() list_nil

#' Get the item at the head of a linked list.
#' @param lst The list
#' @return The element at the head of the list.
#' @export
list_head <- function(lst) lst$head

#' Get the tail of a linked list.
#' @param lst The list
#' @return The tail of the list
#' @export
list_tail <- function(lst) lst$tail

#' Reverse a list
#' @param lst A list
#' @return the reverse of lst
#' @export
list_reverse <- function(lst) {
  acc <- empty_list()
  while (!is_empty(lst)) {
    acc <- list_cons(list_head(lst), acc)
    lst <- list_tail(lst)
  }
  acc
}

## Environment queues #################################################

queue_environment <- function(front, back) {
  e <- new.env(parent = emptyenv())
  e$front <- front
  e$back <- back
  class(e) <- c("env_queue", "environment")
  e
}

#' Construct an empty closure based queue
#' @return an empty queue
#' @export
empty_env_queue <- function()
  queue_environment(empty_list(), empty_list())

#' @method is_empty env_queue
#' @export
is_empty.env_queue <- function(x)
  is_empty(x$front) && is_empty(x$back)

#' @method enqueue env_queue
#' @export
enqueue.env_queue <- function(x, elm) {
  x$back <- list_cons(elm, x$back)
  x
}

#' @method front env_queue
#' @export
front.env_queue <- function(x) {
  if (is_empty(x$front)) {
    x$front <- list_reverse(x$back)
    x$back <- empty_list()
  }
  list_head(x$front)
}

#' @method dequeue env_queue
#' @export
dequeue.env_queue <- function(x) {
  if (is_empty(x$front)) {
    x$front <- list_reverse(x$back)
    x$back <- empty_list()
  }
  x$front <- list_tail(x$front)
  x
}

## Closure queues #####################################################

queue <- function(front, back)
  list(front = front, back = back)

queue_closure <- function() {
  q <- queue(empty_list(), empty_list())

  get_queue <- function() q

  queue_is_empty <- function() is_empty(q$front) && is_empty(q$back)

  enqueue <- function(elm) {
    q <<- queue(q$front, list_cons(elm, q$back))
  }

  front <- function() {
    if (queue_is_empty()) stop("Taking the front of an empty list")
    if (is_empty(q$front)) {
      q <<- queue(list_reverse(q$back), empty_list())
    }
    list_head(q$front)
  }

  dequeue <- function() {
    if (queue_is_empty()) stop("Taking the front of an empty list")
    if (is_empty(q$front)) {
      q <<- queue(list_tail(list_reverse(q$back)), empty_list())
    } else {
      q <<- queue(list_tail(q$front), q$back)
    }
  }

  structure(list(is_empty = queue_is_empty,
                 get_queue = get_queue,
                 enqueue = enqueue,
                 front = front,
                 dequeue = dequeue),
            class = "closure_queue")
}

#' Construct an empty closure based queue
#' @return an empty queue
#' @export
empty_closure_queue <- function() queue_closure()

#' @method is_empty closure_queue
#' @export
is_empty.closure_queue <- function(x) x$is_empty()

#' @method enqueue closure_queue
#' @export
enqueue.closure_queue <- function(x, elm) {
  x$enqueue(elm)
  x
}

#' @method front closure_queue
#' @export
front.closure_queue <- function(x) x$front()

#' @method dequeue closure_queue
#' @export
dequeue.closure_queue <- function(x) {
  x$dequeue()
  x
}

## Extended (purely functional) queues ################################
queue_extended <- function(x, front, back)
  structure(list(x = x, front = front, back = back),
            class = "extended_queue")

#' Construct an empty extended queue
#'
#' This is just a queue that doesn't use a closure to be able to update
#' the data structure when front is called.
#'
#' @return an empty queue
#' @export
empty_extended_queue <- function() queue_extended(NA, empty_list(),
empty_list())

#' @method is_empty extended_queue
#' @export
is_empty.extended_queue <- function(x)
  is_empty(x$front) && is_empty(x$back)

#' @method enqueue extended_queue
#' @export
enqueue.extended_queue <- function(x, elm)
  queue_extended(ifelse(is_empty(x$back), elm, x$x),
                 x$front, list_cons(elm, x$back))

#' @method front extended_queue
#' @export
front.extended_queue <- function(x) {
  if (is_empty(x)) stop("Taking the front of an empty list")
  if (is_empty(x$front)) x$x
  else list_head(x$front)
}

#' @method dequeue extended_queue
#' @export
dequeue.extended_queue <- function(x) {
  if (is_empty(x)) stop("Taking the front of an empty list")
  if (is_empty(x$front))
    x <- queue_extended(NA, list_reverse(x$back), empty_list())
  queue_extended(x$x, list_tail(x$front), x$back)
}

## Performance experiments ######################

library(microbenchmark)
library(tibble)
library(ggplot2)

get_performance_n <- function(
  algo
  , n
  , setup
  , evaluate
  , times
  , ...) {

  config <- setup(n)
  benchmarks <- microbenchmark(evaluate(n, config), times = times)
  tibble(algo = algo, n = n, time = benchmarks$time / 1e9) # time in sec
}

get_performance <- function(
  algo
  , ns
  , setup
  , evaluate
  , times = 10
  , ...) {
  f <- function(n)
    get_performance_n(algo, n, setup, evaluate, times = times, ...)
  results <- Map(f, ns)
  do.call('rbind', results)
}

setup <- function(n) n
evaluate <- function(empty) function(n, x) {
  elements <- 1:n
  queue <- empty
  for (elm in elements) {
    queue <- enqueue(queue, elm)
  }
  for (i in seq_along(elements)) {
    queue <- dequeue(queue)
  }
}

ns <- seq(5000, 10000, by = 1000)
performance <- rbind(get_performance("explicity environment", ns, setup,
evaluate(empty_env_queue())),
                     get_performance("closure environment", ns, setup,
evaluate(empty_closure_queue())),
                     get_performance("functional queue", ns, setup,
evaluate(empty_extended_queue())))

ggplot(performance, aes(x = as.factor(n), y = time / n, fill = algo)) +
  geom_boxplot() +
  scale_fill_grey("Data structure") +
  xlab(quote(n)) + ylab(expression(Time / n)) + theme_minimal()

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]