[R] Peformance question
Paul Johnson
pauljohn32 at gmail.com
Fri Apr 21 13:20:50 CEST 2017
I dont understand your code. But I do have suggestion. Run the functions in
the profiler, maybe differences will point at the enemy.
Know what I mean?
Rprof('check.out')
#run code
Rprof(NULL)
summaryRprof('check.out')
Do that for each method. That may be uninformative.
I wondered if you tried to compile your functions? In some cases it helps
erase differences like this. Norman Matloff has examples like that in Art
of R Programming.
I keep a list of things that are slow, if we can put finger on problem, I
will add to list. I suspect slow here is in runtime object lookup. The
environment ones have info located more quickly by the runtime, I expect.
Also, passing info back and forth from the R runtime system using [ is a
common cause of slow. It is why everybody yells 'vectorize' and 'use
lapply' all the time. Then again, I'm guessing because I dont understand
your code:)
Good luck,
PJ
On Apr 11, 2017 7:44 PM, "Thomas Mailund" <thomas.mailund at gmail.com> wrote:
Hi y’all,
I’m working on a book on how to implement functional data structures in R,
and in particular on a chapter on implementing queues. You get get the
current version here https://www.dropbox.com/s/9c2yk3a67p1ypmr/book.pdf?dl=0
and the relevant pages are 50-59. I’ve implemented three versions of the
same idea, implementing a queue using two linked lists. One list contains
the elements you add to the end of a list, the other contains the elements
at the front of the list, and when you try to get an element from a list
and the front-list is empty you move elements from the back-list to the
front. The asymptotic analysis is explained in this figure
https://www.dropbox.com/s/tzi84zmyq16hdx0/queue-
amortized-linear-bound.png?dl=0 and all my implementations do get a linear
time complexity when I evaluate them on a linear number of operations.
However, the two implementations that uses environments seem to be almost
twice as fast as the implementation that gives me a persistent data
structure (see https://www.dropbox.com/s/i9dyab9ordkm0xj/queue-
comparisons.png?dl=0), and I cannot figure out why.
The code below contains the implementation of all three versions of the
queue plus the code I use to measure their performances. I’m sorry it is a
little long, but it is a minimal implementation of all three variants, the
comments just make it look longer than it really is.
Since the three implementations are doing basically the same things, I am a
little stumped about why the performance is so consistently different.
Can anyone shed some light on this, or help me figure out how to explore
this further?
Cheers
Thomas
## Implementations of queues ##################
#' Test if a data structure is empty
#' @param x The data structure
#' @return TRUE if x is empty.
#' @export
is_empty <- function(x) UseMethod("is_empty")
#' Add an element to a queue
#' @param x A queue
#' @param elm An element
#' @return an updated queue where the element has been added
#' @export
enqueue <- function(x, elm) UseMethod("enqueue")
#' Get the front element of a queue
#' @param x A queue
#' @return the front element of the queue
#' @export
front <- function(x) UseMethod("front")
#' Remove the front element of a queue
#' @param x The queue
#' @return The updated queue
#' @export
dequeue <- function(x) UseMethod("dequeue")
## Linked lists #########################
#' Add a head item to a linked list.
#' @param elem The item to put at the head of the list.
#' @param lst The list -- it will become the tail of the new list.
#' @return a new linked list.
#' @export
list_cons <- function(elem, lst)
structure(list(head = elem, tail = lst), class = "linked_list")
list_nil <- list_cons(NA, NULL)
#' @method is_empty linked_list
#' @export
is_empty.linked_list <- function(x) identical(x, list_nil)
#' Create an empty linked list.
#' @return an empty linked list.
#' @export
empty_list <- function() list_nil
#' Get the item at the head of a linked list.
#' @param lst The list
#' @return The element at the head of the list.
#' @export
list_head <- function(lst) lst$head
#' Get the tail of a linked list.
#' @param lst The list
#' @return The tail of the list
#' @export
list_tail <- function(lst) lst$tail
#' Reverse a list
#' @param lst A list
#' @return the reverse of lst
#' @export
list_reverse <- function(lst) {
acc <- empty_list()
while (!is_empty(lst)) {
acc <- list_cons(list_head(lst), acc)
lst <- list_tail(lst)
}
acc
}
## Environment queues #################################################
queue_environment <- function(front, back) {
e <- new.env(parent = emptyenv())
e$front <- front
e$back <- back
class(e) <- c("env_queue", "environment")
e
}
#' Construct an empty closure based queue
#' @return an empty queue
#' @export
empty_env_queue <- function()
queue_environment(empty_list(), empty_list())
#' @method is_empty env_queue
#' @export
is_empty.env_queue <- function(x)
is_empty(x$front) && is_empty(x$back)
#' @method enqueue env_queue
#' @export
enqueue.env_queue <- function(x, elm) {
x$back <- list_cons(elm, x$back)
x
}
#' @method front env_queue
#' @export
front.env_queue <- function(x) {
if (is_empty(x$front)) {
x$front <- list_reverse(x$back)
x$back <- empty_list()
}
list_head(x$front)
}
#' @method dequeue env_queue
#' @export
dequeue.env_queue <- function(x) {
if (is_empty(x$front)) {
x$front <- list_reverse(x$back)
x$back <- empty_list()
}
x$front <- list_tail(x$front)
x
}
## Closure queues #####################################################
queue <- function(front, back)
list(front = front, back = back)
queue_closure <- function() {
q <- queue(empty_list(), empty_list())
get_queue <- function() q
queue_is_empty <- function() is_empty(q$front) && is_empty(q$back)
enqueue <- function(elm) {
q <<- queue(q$front, list_cons(elm, q$back))
}
front <- function() {
if (queue_is_empty()) stop("Taking the front of an empty list")
if (is_empty(q$front)) {
q <<- queue(list_reverse(q$back), empty_list())
}
list_head(q$front)
}
dequeue <- function() {
if (queue_is_empty()) stop("Taking the front of an empty list")
if (is_empty(q$front)) {
q <<- queue(list_tail(list_reverse(q$back)), empty_list())
} else {
q <<- queue(list_tail(q$front), q$back)
}
}
structure(list(is_empty = queue_is_empty,
get_queue = get_queue,
enqueue = enqueue,
front = front,
dequeue = dequeue),
class = "closure_queue")
}
#' Construct an empty closure based queue
#' @return an empty queue
#' @export
empty_closure_queue <- function() queue_closure()
#' @method is_empty closure_queue
#' @export
is_empty.closure_queue <- function(x) x$is_empty()
#' @method enqueue closure_queue
#' @export
enqueue.closure_queue <- function(x, elm) {
x$enqueue(elm)
x
}
#' @method front closure_queue
#' @export
front.closure_queue <- function(x) x$front()
#' @method dequeue closure_queue
#' @export
dequeue.closure_queue <- function(x) {
x$dequeue()
x
}
## Extended (purely functional) queues ################################
queue_extended <- function(x, front, back)
structure(list(x = x, front = front, back = back),
class = "extended_queue")
#' Construct an empty extended queue
#'
#' This is just a queue that doesn't use a closure to be able to update
#' the data structure when front is called.
#'
#' @return an empty queue
#' @export
empty_extended_queue <- function() queue_extended(NA, empty_list(),
empty_list())
#' @method is_empty extended_queue
#' @export
is_empty.extended_queue <- function(x)
is_empty(x$front) && is_empty(x$back)
#' @method enqueue extended_queue
#' @export
enqueue.extended_queue <- function(x, elm)
queue_extended(ifelse(is_empty(x$back), elm, x$x),
x$front, list_cons(elm, x$back))
#' @method front extended_queue
#' @export
front.extended_queue <- function(x) {
if (is_empty(x)) stop("Taking the front of an empty list")
if (is_empty(x$front)) x$x
else list_head(x$front)
}
#' @method dequeue extended_queue
#' @export
dequeue.extended_queue <- function(x) {
if (is_empty(x)) stop("Taking the front of an empty list")
if (is_empty(x$front))
x <- queue_extended(NA, list_reverse(x$back), empty_list())
queue_extended(x$x, list_tail(x$front), x$back)
}
## Performance experiments ######################
library(microbenchmark)
library(tibble)
library(ggplot2)
get_performance_n <- function(
algo
, n
, setup
, evaluate
, times
, ...) {
config <- setup(n)
benchmarks <- microbenchmark(evaluate(n, config), times = times)
tibble(algo = algo, n = n, time = benchmarks$time / 1e9) # time in sec
}
get_performance <- function(
algo
, ns
, setup
, evaluate
, times = 10
, ...) {
f <- function(n)
get_performance_n(algo, n, setup, evaluate, times = times, ...)
results <- Map(f, ns)
do.call('rbind', results)
}
setup <- function(n) n
evaluate <- function(empty) function(n, x) {
elements <- 1:n
queue <- empty
for (elm in elements) {
queue <- enqueue(queue, elm)
}
for (i in seq_along(elements)) {
queue <- dequeue(queue)
}
}
ns <- seq(5000, 10000, by = 1000)
performance <- rbind(get_performance("explicity environment", ns, setup,
evaluate(empty_env_queue())),
get_performance("closure environment", ns, setup,
evaluate(empty_closure_queue())),
get_performance("functional queue", ns, setup,
evaluate(empty_extended_queue())))
ggplot(performance, aes(x = as.factor(n), y = time / n, fill = algo)) +
geom_boxplot() +
scale_fill_grey("Data structure") +
xlab(quote(n)) + ylab(expression(Time / n)) + theme_minimal()
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list