[R] The L Word
Martin Maechler
maechler at stat.math.ethz.ch
Thu Feb 24 18:59:18 CET 2011
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Thu, 24 Feb 2011 18:34:36 +0100 writes:
>>>>> "HW" == Hadley Wickham <hadley at rice.edu>
>>>>> on Thu, 24 Feb 2011 10:14:35 -0600 writes:
>>> Note however that I've never seen evidence for a *practical*
>>> difference in simple cases, and also of such cases as part of
>>> a larger computation. But I'm happy to see one if anyone has
>>> an interesting example.
>>>
>>> E.g., I would typically never use 0L:100L instead of 0:100
>>> in an R script because I think code readability (and self
>>> explainability) is of considerable importance too.
HW> But : casts to integer anyway:
>>> str(0:100)
HW> int [1:101] 0 1 2 3 4 5 6 7 8 9 ...
MM> Sure !! I've been the one who had use 0:0 or 1:1
MM> in those rare cases integers where required (e.g. in .C(..)),
MM> before "the L word" existed.
HW> And performance in this case is (obviously) negligible:
>>> library(microbenchmark)
>>> microbenchmark(as.integer(c(0, 100)), times = 1000)
HW> Unit: nanoeconds
HW> min lq median uq max
HW> as.integer(c(0, 100)) 712 791 813 896 15840
HW> (mainly included as opportunity to try out microbenchmark)
MM> ??
MM> Thanks! Did not know it.
MM> *HOWEVER* the above as.integer(c(0,100)) is of course
MM> *much more* than what is internally needed to cast
MM> the two doubles to integer.
MM> Try this a few times ... and wonder :
MM> boxplot(mb2 <- microbenchmark(L = 1L:100L, 1:100, times=5000), notch=TRUE
>> mb2
MM> Unit: nanoeconds
MM> min lq median uq max
MM> L 316 410 472 555 6843
MM> 1:100 311 393 440 497 7309
MM> the result (on my 64-bit linux) seems to indicate that 1L:100L
MM> takes even slightly (but significantly ["notches"]) longer.
MM> However, using
MM> boxplot(mb <- microbenchmark(1:100, L = 1L:100L, times=5000),
notch=TRUE)
>> mb
MM> Unit: nanoeconds
MM> min lq median uq max
MM> 1:100 296 401 469 550 9426
MM> L 313 396 438 496 16525
MM> is less conclusive..
MM> so, actually this is exactly one of those cases
MM> I do *not* see a difference, even if I look very hard.
MM> { BTW: There's at least one (if not two) buglet in
MM> 'microbenchmark' which I evaded using "L = " above :
MM> 1) It should not use as.character(exprs)
MM> but rather unlist(lapply(exprs, deparse))
MM> 2) boxplot.microbenchmark should probably be more careful for
MM> the case when two rows have the same name (as it happens if
MM> I leave away "L = " above)
MM> }
and I forgot the buglet in print.microbenchmark which says
"nanoeconds" (missing "s").
BTW: Another -- more realistic example where I can't see any
advantage of using "L" is x[1L] vs x[1] :
> str(x <- 0 + 1:100)
num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
> t. <- microbenchmark(x[1], times=5000)
> tL <- microbenchmark(x[1L], times=5000)
> t.
Unit: nanoeconds
min lq median uq max
x[1] 198 208 241.5 299 4843
> tL
Unit: nanoeconds
min lq median uq max
x[1] 194 208 234 296 6304
>
so the noise is much much larger than a noticable difference.
Martin
More information about the R-help
mailing list