[Rd] max on numeric_version with long components

Sat Apr 27 21:44:43 CEST 2024

В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane <jkeane using gmail.com> пишет:

> In devel:
> > max(numeric_version(c("1.0.1.100000000", "1.0.3.100000000",  
> "1.0.2.100000000")))
> [1] ‘1.0.1.100000000’
> > max(numeric_version(c("1.0.1.10000000", "1.0.3.10000000",  
> "1.0.2.10000000")))
> [1] ‘1.0.3.10000000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max(<numeric_version>) was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.100000000", "1.0.3.100000000", "1.0.2.100000000"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "000000001000000000000000001575360400"
# [2] "000000001000000000000000003575360400"
# [3] "000000001000000000000000002575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "000000001000000000000000001575360400"
# [2] "000000001000000000000000002575360400"
# [3] "000000001000000000000000003575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan