[R] The L Word

Thu Feb 24 18:59:18 CET 2011

>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Thu, 24 Feb 2011 18:34:36 +0100 writes:

>>>>> "HW" == Hadley Wickham <hadley at rice.edu>
>>>>>     on Thu, 24 Feb 2011 10:14:35 -0600 writes:

    >>> Note however that I've never seen evidence for a *practical*
    >>> difference in simple cases, and also of such cases as part of
    >>> a larger computation.  But I'm happy to see one if anyone has
    >>> an interesting example.
    >>> 
    >>> E.g., I would typically never use  0L:100L  instead of 0:100
    >>> in an R script because I think code readability (and self
    >>> explainability) is of considerable importance too.

    HW> But : casts to integer anyway:

    >>> str(0:100)
    HW> int [1:101] 0 1 2 3 4 5 6 7 8 9 ...

    MM> Sure !!  I've been the one who had use  0:0  or 1:1  
    MM> in those rare cases integers where required (e.g. in .C(..)),
    MM> before "the L word" existed.

    HW> And performance in this case is (obviously) negligible:

    >>> library(microbenchmark)
    >>> microbenchmark(as.integer(c(0, 100)), times = 1000)
    HW> Unit: nanoeconds
    HW> min  lq median  uq   max
    HW> as.integer(c(0, 100)) 712 791    813 896 15840

    HW> (mainly included as opportunity to try out microbenchmark)
    MM> ??
    MM> Thanks!  Did not know it.

    MM> *HOWEVER* the above as.integer(c(0,100)) is of course
    MM> *much more* than what is internally needed to cast 
    MM> the two   doubles to integer.

    MM> Try this a few times ... and wonder :

    MM> boxplot(mb2 <- microbenchmark(L = 1L:100L, 1:100, times=5000), notch=TRUE

    >> mb2
    MM> Unit: nanoeconds
    MM> min  lq median  uq  max
    MM> L     316 410    472 555 6843
    MM> 1:100 311 393    440 497 7309

    MM> the result (on my 64-bit linux) seems to indicate that 1L:100L
    MM> takes even slightly (but significantly ["notches"]) longer.

    MM> However, using

    MM> boxplot(mb <- microbenchmark(1:100, L = 1L:100L, times=5000),
    notch=TRUE)

    >> mb
    MM> Unit: nanoeconds
    MM> min  lq median  uq   max
    MM> 1:100 296 401    469 550  9426
    MM> L     313 396    438 496 16525

    MM> is less conclusive.. 
    MM> so, actually this is exactly one of those cases 
    MM> I do *not* see a difference, even if I look very hard.

    MM> { BTW: There's at least one (if not two) buglet in
    MM>    'microbenchmark'   which I evaded using "L = " above :

    MM> 1) It should not use   as.character(exprs)  
    MM> but rather          unlist(lapply(exprs, deparse))

    MM> 2) boxplot.microbenchmark should probably be more careful for
    MM> the case when two rows have the same name (as it happens if
    MM> I leave away "L = " above)
    MM> }

and I forgot the buglet in  print.microbenchmark which says
"nanoeconds" (missing "s").

BTW: Another -- more realistic example where I can't see any
advantage of using "L" is  x[1L]  vs  x[1] :

  > str(x <- 0 + 1:100)
   num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
  > t. <- microbenchmark(x[1], times=5000)
  > tL <- microbenchmark(x[1L], times=5000)
  > t.
  Unit: nanoeconds
       min  lq median  uq  max
  x[1] 198 208  241.5 299 4843
  > tL
  Unit: nanoeconds
       min  lq median  uq  max
  x[1] 194 208    234 296 6304
  > 

so the noise is much much larger than a noticable difference.

Martin