[Rd] Undefined behavior of head() and tail() with n = 0
Martin Maechler
maechler at stat.math.ethz.ch
Fri Jan 27 14:55:38 CET 2017
Dear Florent,
thank you for striving to clearly disentangle and present the
issue below.
That is a nice "role model" way of approaching such topics!
>>>>> Florent Angly <florent.angly at gmail.com>
>>>>> on Fri, 27 Jan 2017 10:24:39 +0100 writes:
> Martin, I agree with you that +0 and -0 should generally be treated as
> equal, and R does a fine job in this respect. The Wikipedia article on
> signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
> view but also highlights that +0 and -0 can be treated differently in
> particular situations, including their interpretation as mathematical
> limits (as in the 1/-0 case). Indeed, the main question here is
> whether head() and tail() represent a special case that would benefit
> from differentiating between +0 and -0.
> We can break down the discussion into two problems:
> A/ the discrepancy between the implementation of R head() and tail()
> and the documentation of these functions (where the use of zero is not
> documented and thus not permissible),
Ehm, no, in R (and many other software systems),
"not documented" does *NOT* entail "not permissible"
> B/ the discrepancy between the implementation of R head() and tail()
> and their GNU equivalent (which allow zeros and differentiate between
> -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").
This discrepancy, as you mention later comes from the fact that
basically, these arguments are strings in the Unix tools (GNU being a
special case of Unix, here) and integers in R.
Below, I'm giving my personal view of the issue:
> There are several possible solutions to address these discrepancies:
> 1/ Leave the code as-is but document its behavior with respect to zero
> (zeros allowed, with negative zeros treated like positive zeros).
> Advantages: This is the path of least resistance, and discrepancy A is fixed.
> Disadvantages: Discrepancy B remains (but is documented).
That would be my "clear" choice.
> 2/ Leave the documentation as-is but reflect this in code by not
> allowing zeros at all.
> Advantages: Discrepancy A is fixed.
> Disadvantages: Discrepancy B remains in some form (but is documented).
> Need to deprecate the usage of +0 (which was not clearly documented
> but may have been assumed by users).
2/ looks "uniformly inferior" to 1/ to me
> 3/ Update the code and documentation to differentiate between +0 and -0.
> Advantages: In my eyes, this is the ideal solution since discrepancy A
> and (most of) B are resolved.
> Disadvantages: It is unclear how to implement this solution and the
> implications it may have on backward compatibility:
> a/ Allow -0 (as double). But is it supported on all platforms used
> by R (see ?Arithmetic)? William has raised the issue that negative
> zero cannot be represented as an integer. Should head() and tail()
> then strictly check double input (while forbidding integers)?
> b/ The input could always be as character. This would allow to
> mirror even more closely GNU tail (where the prefix "+" is used to
> invert the meaning of n). This probably involves a fair amount of work
> and careful handling of deprecation.
3/ involves quite a few complications, and in my view, your
advantages are not even getting close to counter-weigh the drawbacks.
> On 26 January 2017 at 16:51, William Dunlap <wdunlap at tibco.com> wrote:
>> In addition, signed zeroes only exist for floating point numbers - the
>> bit patterns for as.integer(0) and as.integer(-0) are identical.
indeed!
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
>>>>>>>> Florent Angly <florent.angly at gmail.com>
>>>>>>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>>
>>> > Hi all,
>>> > The documentation for head() and tail() describes the behavior of
>>> > these generic functions when n is strictly positive (n > 0) and
>>> > strictly negative (n < 0). How these functions work when given a zero
>>> > value is not defined.
>>>
>>> > Both GNU command-line utilities head and tail behave differently with +0 and -0:
>>> > http://man7.org/linux/man-pages/man1/head.1.html
>>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>>
>>> > Since R supports signed zeros (1/+0 != 1/-0)
>>>
>>> whoa, whoa, .. slow down -- The above is misleading!
>>>
>>> Rather read in ?Arithmetic (*the* reference to consult for such issues),
>>> where the 2nd part of the following section
>>>
>>> || Implementation limits:
>>> ||
>>> || [..............]
>>> ||
>>> || Another potential issue is signed zeroes: on IEC 60659 platforms
>>> || there are two zeroes with internal representations differing by
>>> || sign. Where possible R treats them as the same, but for example
>>> || direct output from C code often does not do so and may output
>>> || ‘-0.0’ (and on Windows whether it does so or not depends on the
>>> || version of Windows). One place in R where the difference might be
>>> || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
>>> || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq =
>>> || FALSE)’.
>>>
>>> says the *contrary* ( __Where possible R treats them as the same__ ):
>>> We do _not_ want to distinguish -0 and +0,
>>> but there are cases where it is inavoidable
>>>
>>> And there are good reasons (mathematics !!) for this.
>>>
>>> I'm pretty sure that it would be quite a mistake to start
>>> differentiating it here... but of course we can continue
>>> discussing here if you like.
>>>
>>> Martin Maechler
>>> ETH Zurich and R Core
>>>
>>>
>>> > and the R head() and tail() functions are modeled after
>>> > their GNU counterparts, I would expect the R functions to
>>> > distinguish between +0 and -0
>>>
>>> >> tail(1:5, n=0)
>>> > integer(0)
>>> >> tail(1:5, n=1)
>>> > [1] 5
>>> >> tail(1:5, n=2)
>>> > [1] 4 5
>>>
>>> >> tail(1:5, n=-2)
>>> > [1] 3 4 5
>>> >> tail(1:5, n=-1)
>>> > [1] 2 3 4 5
>>> >> tail(1:5, n=-0)
>>> > integer(0) # expected 1:5
>>>
>>> >> head(1:5, n=0)
>>> > integer(0)
>>> >> head(1:5, n=1)
>>> > [1] 1
>>> >> head(1:5, n=2)
>>> > [1] 1 2
>>>
>>> >> head(1:5, n=-2)
>>> > [1] 1 2 3
>>> >> head(1:5, n=-1)
>>> > [1] 1 2 3 4
>>> >> head(1:5, n=-0)
>>> > integer(0) # expected 1:5
>>>
>>> > For both head() and tail(), I expected 1:5 as output but got
>>> > integer(0). I obtained similar results using a data.frame and a
>>> > function as x argument.
>>>
>>> > An easy fix would be to explicitly state in the documentation what n =
>>> > 0 does, and that there is no practical difference between -0 and +0.
>>> > However, in my eyes, the better approach would be implement support
>>> > for -0 and document it. What do you think?
>>>
>>> > Best,
>>>
>>> > Florent
>>>
>>>
>>> > PS/ My sessionInfo() gives:
>>> > R version 3.3.2 (2016-10-31)
>>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> > Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>
>>> > locale:
>>> > [1] LC_COLLATE=German_Switzerland.1252
>>> > LC_CTYPE=German_Switzerland.1252
>>> > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>> > LC_TIME=German_Switzerland.1252
>>>
>>> > attached base packages:
>>> > [1] stats graphics grDevices utils datasets methods base
>>>
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list