[R] The L Word

Thu Feb 24 22:30:47 CET 2011

On 02/24/2011 05:14 PM, Hadley Wickham wrote:
>> Note however that I've never seen evidence for a *practical*
>> difference in simple cases, and also of such cases as part of a
>> larger computation.
>> But I'm happy to see one if anyone has an interesting example.
>>
>> E.g., I would typically never use  0L:100L  instead of 0:100
>> in an R script because I think code readability (and self
>> explainability) is of considerable importance too.
>
> But : casts to integer anyway:
I know - I just thought that on _this_ thread I ought to write it with L ;-) and 
I don't think I write 1L : 100L in real life.

I use the L far more often as a reminder than for performance. Particularly in 
function definitions.

>
>> str(0:100)
>   int [1:101] 0 1 2 3 4 5 6 7 8 9 ...
>
> And performance in this case is (obviously) negligible:
>
>> library(microbenchmark)
>> microbenchmark(as.integer(c(0, 100)), times = 1000)
> Unit: nanoeconds
>                        min  lq median  uq   max
> as.integer(c(0, 100)) 712 791    813 896 15840
>
> (mainly included as opportunity to try out microbenchmark)

> So you save ~800 ns but typing two letters probably takes 0.2 s (100
> wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
> you time if you're going to be calling it more than 125000 times ;)
calling 125000 times happens in my real life. I have e.g. one data set with 2e5 
spectra (and another batch of that size waiting for me), so anything done "for 
each spectrum" reaches this number each time the function is needed.
Also of course, the conversion time goes with the length of the vector.
On the other hand, in > 95 % of the cases taking an hour to think about the 
algorithm will have much larger effects ;-).

Also, I notice that the first few measures of microbenchmark are often much 
longer (for fast operations). Which may just indicate that the total speed 
depends much more on whether the code allows caching or not. And that may mean 
that any such coding details may or may not help at all: A single such 
conversion may take disproportionally much more time.

I just (yesterday) came across a situation where the difference between numeric 
and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array 
size): as.factor
 > microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0))
Unit: nanoeconds
        min      lq  median      uq     max
i   884039  891106  895847  901630 2524877
d  2698637 2770936 2778271 2807572 4266197

but then:
 > microbenchmark (
sd = structure ((1:1e3)+0.0, .Label = 1:100, class = "factor"),
si = structure ((1:1e3)+0L, .Label = 1:100, class = "factor"))
Unit: nanoeconds
        min      lq  median      uq     max
sd   52875   53615   54040   54448 1385422
si   45904   46936   47332   47778   65360

Cheers,

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it