[Rd] [External] Re: zapsmall(x) for scalar x

Mon Dec 18 13:56:46 CET 2023

Does mFUN() really need to be a function of x and the NA values of x? I
can't think of a case where it would be used on anything but the non-NA
values of x.

I think it would be easier to specify a different mFUN() (and document this
new argument) if the function has one argument and is applied to the non-NA
values of x.

zapsmall <- function(x,
    digits = getOption("digits"),
    mFUN = function(x) max(abs(x)),
    min.d = 0L
) {
    if (length(digits) == 0L)
        stop("invalid 'digits'")
    if (all(ina <- is.na(x)))
        return(x)
    mx <- mFUN(x[!ina])
    round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx)))
else digits)
}

Steve

On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel <r-devel using r-project.org>
wrote:

> Le 18/12/2023 à 11:24, Martin Maechler a écrit :
> >>>>>> Serguei Sokol via R-devel
> >>>>>>      on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> >      > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> >      >> I think what's been missed is that zapsmall works relative to
> the absolute
> >      >> largest value in the vector. Hence if there's only one
> >      >> item in the vector, it is the largest, so its not zapped. The
> function's
> >      >> raison d'etre isn't to replace absolutely small values,
> >      >> but small values relative to the largest. Hence a vector of
> similar tiny
> >      >> values doesn't get zapped.
> >      >>
> >      >> Maybe the line in the docs:
> >      >>
> >      >> " (compared with the maximal absolute value)"
> >      >>
> >      >> needs to read:
> >      >>
> >      >> " (compared with the maximal absolute value in the vector)"
> >
> >      > I agree that this change in the doc would clarify the situation
> but
> >      > would not resolve proposed corner cases.
> >
> >      > I think that an additional argument 'mx' (absolute max value of
> >      > reference) would do. Consider:
> >
> >      > zapsmall2 <-
> >      > function (x, digits = getOption("digits"), mx=max(abs(x),
> na.rm=TRUE))
> >      > {
> >      >     if (length(digits) == 0L)
> >      >         stop("invalid 'digits'")
> >      >     if (all(ina <- is.na(x)))
> >      >         return(x)
> >      >     round(x, digits = if (mx > 0) max(0L, digits -
> >      > as.numeric(log10(mx))) else digits)
> >      > }
> >
> >      > then zapsmall2() without explicit 'mx' behaves identically to
> actual
> >      > zapsmall() and for a scalar or a vector of identical value, user
> can
> >      > manually fix the scale of what should be considered as small:
> >
> >      >> zapsmall2(y)
> >      > [1] 2.220446e-16
> >      >> zapsmall2(y, mx=1)
> >      > [1] 0
> >      >> zapsmall2(c(y, y), mx=1)
> >      > [1] 0 0
> >      >> zapsmall2(c(y, NA))
> >      > [1] 2.220446e-16           NA
> >      >> zapsmall2(c(y, NA), mx=1)
> >      > [1]  0 NA
> >
> >      > Obviously, the name 'zapsmall2' was chosen just for this
> explanation.
> >      > The original name 'zapsmall' could be reused as a full backward
> >      > compatibility is preserved.
> >
> >      > Best,
> >      > Serguei.
> >
> > Thank you, Serguei, Duncan, Barry et al.
> >
> > Generally :
> >    Yes, zapsmall was meant and is used for zapping *relatively*
> >    small numbers.  In the other cases,  directly  round()ing is
> >    what you should use.
> >
> > Specifically to Serguei's proposal of allowing the "max" value
> > to be user specified (in which case it is not really a true
> > max() anymore):
> >
> > I've spent quite a a few hours on this problem in May 2022, to
> > make it even more flexible, e.g. allowing to use a 99%
> > percentile instead of the max(), or allowing to exclude +Inf
> > from the "mx"; but -- compared to your zapsmall2() --
> > to allow reproducible automatic choice :
> >
> >
> > zapsmall <- function(x, digits = getOption("digits"),
> >                       mFUN = function(x, ina) max(abs(x[!ina])),
> >                    min.d = 0L)
> > {
> >      if (length(digits) == 0L)
> >          stop("invalid 'digits'")
> >      if (all(ina <- is.na(x)))
> >          return(x)
> >      mx <- mFUN(x, ina)
> >      round(x, digits = if(mx > 0) max(min.d, digits -
> as.numeric(log10(mx))) else digits)
> > }
> >
> > with optional 'min.d' as I had (vaguely remember to have) found
> > at the time that the '0' is also not always "the only correct" choice.
> Do you have a case or two where min.d could be useful?
>
> Serguei.
>
> >
> > Somehow I never got to propose/discuss the above,
> > but it seems a good time to do so now.
> >
> > Martin
> >
> >
> >
> >      >> barry
> >      >>
> >      >>
> >      >> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch <
> murdoch.duncan using gmail.com>
> >      >> wrote:
> >      >>
> >      >>> This email originated outside the University. Check before
> clicking links
> >      >>> or attachments.
> >      >>>
> >      >>> I'm really confused.  Steve's example wasn't a scalar x, it was
> a
> >      >>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and
> I don't
> >      >>> see why summary() would if it was using your proposal.
> >      >>>
> >      >>> Duncan Murdoch
> >      >>>
> >      >>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
> >      >>>> Isn’t that the correct outcome?  The user can change the
> number of
> >      >>> digits if they want to see small values…
> >      >>>>
> >      >>>> --
> >      >>>> Change your thoughts and you change the world.
> >      >>>> --Dr. Norman Vincent Peale
> >      >>>>
> >      >>>>> On Dec 17, 2023, at 12:11 AM, Steve Martin <
> stevemartin041 using gmail.com>
> >      >>> wrote:
> >      >>>>> Zapping a vector of small numbers to zero would cause
> problems when
> >      >>>>> printing the results of summary(). For example, if
> >      >>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0)
> then
> >      >>>>> print(summary(2.220446e-16), digits = 7) would print
> >      >>>>> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >      >>>>> 0          0            0           0           0          0
> >      >>>>>
> >      >>>>> The same problem can also appear when printing the results of
> >      >>>>> summary.glm() with show.residuals = TRUE if there's little
> dispersion
> >      >>>>> in the residuals.
> >      >>>>>
> >      >>>>> Steve
> >      >>>>>
> >>>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg using warnes.net>
> wrote:
> >      >>>>>>
> >>>>>> I was quite suprised to discover that applying `zapsmall` to a
> scalar
> >      >>> value has no apparent effect.  For example:
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(y,)
> >>>>>> [1] 2.2204e-16
> >      >>>>>>
> >>>>>> I was expecting zapsmall(x)` to act like
> >      >>>>>>
> >      >>>>>>> round(y, digits=getOption('digits'))
> >>>>>> [1] 0
> >      >>>>>>
> >>>>>> Looking at the current source code, indicates that `zapsmall` is
> >      >>> expecting a vector:
> >>>>>> zapsmall <-
> >>>>>> function (x, digits = getOption("digits"))
> >>>>>> {
> >>>>>>       if (length(digits) == 0L)
> >>>>>>           stop("invalid 'digits'")
> >>>>>>       if (all(ina <- is.na(x)))
> >>>>>>           return(x)
> >>>>>>       mx <- max(abs(x[!ina]))
> >>>>>>       round(x, digits = if (mx > 0) max(0L, digits -
> >      >>> as.numeric(log10(mx))) else digits)
> >>>>>> }
> >      >>>>>>
> >>>>>> If `x` is a non-zero scalar, zapsmall will never perform rounding.
> >      >>>>>>
> >>>>>> The man page simply states:
> >>>>>> zapsmall determines a digits argument dr for calling round(x,
> digits =
> >      >>> dr) such that values close to zero (compared with the maximal
> absolute
> >      >>> value) are ‘zapped’, i.e., replaced by 0.
> >>>>>> and doesn’t provide any details about how ‘close to zero’ is
> defined.
> >      >>>>>>
> >>>>>> Perhaps handling the special when `x` is a scalar (or only contains
> a
> >      >>> single non-NA value)  would make sense:
> >>>>>> zapsmall <-
> >>>>>> function (x, digits = getOption("digits"))
> >>>>>> {
> >>>>>>       if (length(digits) == 0L)
> >>>>>>           stop("invalid 'digits'")
> >>>>>>       if (all(ina <- is.na(x)))
> >>>>>>           return(x)
> >>>>>>       mx <- max(abs(x[!ina]))
> >>>>>>       round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 )
> max(0L,
> >      >>> digits - as.numeric(log10(mx))) else digits)
> >>>>>> }
> >      >>>>>>
> >>>>>> Yielding:
> >      >>>>>>
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(y)
> >>>>>> [1] 0
> >      >>>>>>
> >>>>>> Another edge case would be when all of the non-na values are the
> same:
> >      >>>>>>
> >      >>>>>>> y <- 2.220446e-16
> >      >>>>>>> zapsmall(c(y,y))
> >>>>>> [1] 2.220446e-16 2.220446e-16
> >      >>>>>>
> >>>>>> Thoughts?
> >      >>>>>>
> >      >>>>>>
> >>>>>> Gregory R. Warnes, Ph.D.
> >>>>>> greg using warnes.net
> >>>>>> Eternity is a long time, take a friend!
> >      >>>>>>
> >      >>>>>>
> >      >>>>>>
> >>>>>>           [[alternative HTML version deleted]]
> >      >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-devel using r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>>> [[alternative HTML version deleted]]
> >      >>>>
> >      >>>> ______________________________________________
> >      >>>> R-devel using r-project.org mailing list
> >      >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>> ______________________________________________
> >      >>> R-devel using r-project.org mailing list
> >      >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >>>
> >      >> [[alternative HTML version deleted]]
> >      >>
> >      >> ______________________________________________
> >      >> R-devel using r-project.org mailing list
> >      >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >      > --
> >      > Serguei Sokol
> >      > Ingenieur de recherche INRAE
> >
> >      > Cellule Mathématiques
> >      > TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
> >      > 135 Avenue de Rangueil
> >      > 31077 Toulouse Cedex 04
> >
> >      > tel: +33 5 61 55 98 49
> >      > email: sokol using insa-toulouse.fr
> >      >
> https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/
> >
> >      > ______________________________________________
> >      > R-devel using r-project.org mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]