[Rd] Why is the diag function so slow (for extraction)?
Steve Bronder
sbronder at stevebronder.com
Mon May 11 19:10:02 CEST 2015
Sorry if this is a re-post, not sure if my original message got though. Is
it possible to replace c() with .subset()? Or would this throw errors
because of something specific in c() or as.matrix()? There's a pretty nice
speed up by replacing c() with the .subset().
Ex.
----
library(microbenchmark)
diag2 <- function(x,nrow, ncol){
if (is.matrix(x)) {
if (nargs() > 1L)
stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
if ((m <- min(dim(x))) == 0L)
return(vector(typeof(x), 0L))
# replace this part
y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
nms <- dimnames(x)
if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <-
nms[[1L]][seq_len(m)]),
nms[[2L]][seq_len(m)]))
names(y) <- nm
return(y)
}
if (is.array(x) && length(dim(x)) != 1L)
stop("'x' is an array, but not one-dimensional.")
if (missing(x))
n <- nrow
else if (length(x) == 1L && nargs() == 1L) {
n <- as.integer(x)
x <- 1
}
else n <- length(x)
if (!missing(nrow))
n <- nrow
if (missing(ncol))
ncol <- n
}
# Tests
nc <- 1e04
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
runoff <- microbenchmark(
diaga <- diag(m),
diagb <- diag2(m)
)
runoff
Unit: microseconds
expr min lq mean median uq max
neval
diaga 429033.896 434186.694 512143.6728 503355.5865 572811.11 656035.584
100
diagb 216.112 251.445 536.8531 688.3595 706.98 2437.921
100
Regards,
Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com
On Thu, May 7, 2015 at 11:49 AM, Steve Bronder <sbronder at stevebronder.com>
wrote:
> Is it possible to replace c() with .subset()? Example below
>
> ####
> ####
>
> library(microbenchmark)
>
>
> diag2 <- function(x,nrow, ncol){
> if (is.matrix(x)) {
> if (nargs() > 1L)
> stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
> if ((m <- min(dim(x))) == 0L)
> return(vector(typeof(x), 0L))
> # replace this part
> y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
> nms <- dimnames(x)
> if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <-
> nms[[1L]][seq_len(m)]),
>
> nms[[2L]][seq_len(m)]))
> names(y) <- nm
> return(y)
> }
> if (is.array(x) && length(dim(x)) != 1L)
> stop("'x' is an array, but not one-dimensional.")
> if (missing(x))
> n <- nrow
> else if (length(x) == 1L && nargs() == 1L) {
> n <- as.integer(x)
> x <- 1
> }
> else n <- length(x)
> if (!missing(nrow))
> n <- nrow
> if (missing(ncol))
> ncol <- n
> }
>
> nc <- 10
>
> set.seed(1)
> m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
>
>
> runoff <- microbenchmark(
> diaga = diag(m),
> diagb = diag2(m)
> )
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbronder at stevebronder.com
>
>
> On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
>
>> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
>> why the change was made many years ago, but this was before names were
>> handled explicitly. It would definitely be better to not force the
>> duplicate, at least in the case where we are sure c() and [ would not
>> dispatch.
>>
>> Best,
>>
>> luke
>>
>> On Mon, 4 May 2015, peter dalgaard wrote:
>>
>>
>>> On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote:
>>>>
>>>> But I'm still wondering why diag() uses c()...? With it being so slow,
>>>> I'd
>>>> be inclined to write a qdiag() without the c() and just use that the
>>>> next
>>>> time I need matrix algebra. Any insight would be appreciated; thanks!
>>>>
>>>
>>> Well, there are two possibilities: Either it is deliberate or it isn't.
>>>
>>> The latter isn't too unlikely, given that the effect is seen for large
>>> matrices. I would appear to be a matter of O(n) (picking out n items) vs.
>>> O(n^2) (copying an n x n matrix), but this might drown out in a context
>>> involving matrix multiplication and/or inversion, both of which are O(n^3).
>>>
>>> If it is deliberate, the question is why. There could be devils in the
>>> details; notice in particular that c() strips off non-name attributes.
>>> However, I'm not aware of a situation where such attributes could cause
>>> trouble.
>>>
>>> -pd
>>>
>>>
>>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa Phone: 319-335-3386
>> Department of Statistics and Fax: 319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall email: luke-tierney at uiowa.edu
>> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list