[R] The L Word
Claudia Beleites
cbeleites at units.it
Thu Feb 24 22:30:47 CET 2011
On 02/24/2011 05:14 PM, Hadley Wickham wrote:
>> Note however that I've never seen evidence for a *practical*
>> difference in simple cases, and also of such cases as part of a
>> larger computation.
>> But I'm happy to see one if anyone has an interesting example.
>>
>> E.g., I would typically never use 0L:100L instead of 0:100
>> in an R script because I think code readability (and self
>> explainability) is of considerable importance too.
>
> But : casts to integer anyway:
I know - I just thought that on _this_ thread I ought to write it with L ;-) and
I don't think I write 1L : 100L in real life.
I use the L far more often as a reminder than for performance. Particularly in
function definitions.
>
>> str(0:100)
> int [1:101] 0 1 2 3 4 5 6 7 8 9 ...
>
> And performance in this case is (obviously) negligible:
>
>> library(microbenchmark)
>> microbenchmark(as.integer(c(0, 100)), times = 1000)
> Unit: nanoeconds
> min lq median uq max
> as.integer(c(0, 100)) 712 791 813 896 15840
>
> (mainly included as opportunity to try out microbenchmark)
> So you save ~800 ns but typing two letters probably takes 0.2 s (100
> wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
> you time if you're going to be calling it more than 125000 times ;)
calling 125000 times happens in my real life. I have e.g. one data set with 2e5
spectra (and another batch of that size waiting for me), so anything done "for
each spectrum" reaches this number each time the function is needed.
Also of course, the conversion time goes with the length of the vector.
On the other hand, in > 95 % of the cases taking an hour to think about the
algorithm will have much larger effects ;-).
Also, I notice that the first few measures of microbenchmark are often much
longer (for fast operations). Which may just indicate that the total speed
depends much more on whether the code allows caching or not. And that may mean
that any such coding details may or may not help at all: A single such
conversion may take disproportionally much more time.
I just (yesterday) came across a situation where the difference between numeric
and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array
size): as.factor
> microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0))
Unit: nanoeconds
min lq median uq max
i 884039 891106 895847 901630 2524877
d 2698637 2770936 2778271 2807572 4266197
but then:
> microbenchmark (
sd = structure ((1:1e3)+0.0, .Label = 1:100, class = "factor"),
si = structure ((1:1e3)+0L, .Label = 1:100, class = "factor"))
Unit: nanoeconds
min lq median uq max
sd 52875 53615 54040 54448 1385422
si 45904 46936 47332 47778 65360
Cheers,
Claudia
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste
phone: +39 0 40 5 58-37 68
email: cbeleites at units.it
More information about the R-help
mailing list