[Rd] [External] Re: zapsmall(x) for scalar x
Serguei Sokol
@oko| @end|ng |rom |n@@-tou|ou@e@|r
Mon Dec 18 11:47:19 CET 2023
Le 18/12/2023 à 11:24, Martin Maechler a écrit :
>>>>>> Serguei Sokol via R-devel
>>>>>> on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> >> I think what's been missed is that zapsmall works relative to the absolute
> >> largest value in the vector. Hence if there's only one
> >> item in the vector, it is the largest, so its not zapped. The function's
> >> raison d'etre isn't to replace absolutely small values,
> >> but small values relative to the largest. Hence a vector of similar tiny
> >> values doesn't get zapped.
> >>
> >> Maybe the line in the docs:
> >>
> >> " (compared with the maximal absolute value)"
> >>
> >> needs to read:
> >>
> >> " (compared with the maximal absolute value in the vector)"
>
> > I agree that this change in the doc would clarify the situation but
> > would not resolve proposed corner cases.
>
> > I think that an additional argument 'mx' (absolute max value of
> > reference) would do. Consider:
>
> > zapsmall2 <-
> > function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
> > {
> > if (length(digits) == 0L)
> > stop("invalid 'digits'")
> > if (all(ina <- is.na(x)))
> > return(x)
> > round(x, digits = if (mx > 0) max(0L, digits -
> > as.numeric(log10(mx))) else digits)
> > }
>
> > then zapsmall2() without explicit 'mx' behaves identically to actual
> > zapsmall() and for a scalar or a vector of identical value, user can
> > manually fix the scale of what should be considered as small:
>
> >> zapsmall2(y)
> > [1] 2.220446e-16
> >> zapsmall2(y, mx=1)
> > [1] 0
> >> zapsmall2(c(y, y), mx=1)
> > [1] 0 0
> >> zapsmall2(c(y, NA))
> > [1] 2.220446e-16 NA
> >> zapsmall2(c(y, NA), mx=1)
> > [1] 0 NA
>
> > Obviously, the name 'zapsmall2' was chosen just for this explanation.
> > The original name 'zapsmall' could be reused as a full backward
> > compatibility is preserved.
>
> > Best,
> > Serguei.
>
> Thank you, Serguei, Duncan, Barry et al.
>
> Generally :
> Yes, zapsmall was meant and is used for zapping *relatively*
> small numbers. In the other cases, directly round()ing is
> what you should use.
>
> Specifically to Serguei's proposal of allowing the "max" value
> to be user specified (in which case it is not really a true
> max() anymore):
>
> I've spent quite a a few hours on this problem in May 2022, to
> make it even more flexible, e.g. allowing to use a 99%
> percentile instead of the max(), or allowing to exclude +Inf
> from the "mx"; but -- compared to your zapsmall2() --
> to allow reproducible automatic choice :
>
>
> zapsmall <- function(x, digits = getOption("digits"),
> mFUN = function(x, ina) max(abs(x[!ina])),
> min.d = 0L)
> {
> if (length(digits) == 0L)
> stop("invalid 'digits'")
> if (all(ina <- is.na(x)))
> return(x)
> mx <- mFUN(x, ina)
> round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx))) else digits)
> }
>
> with optional 'min.d' as I had (vaguely remember to have) found
> at the time that the '0' is also not always "the only correct" choice.
Do you have a case or two where min.d could be useful?
Serguei.
>
> Somehow I never got to propose/discuss the above,
> but it seems a good time to do so now.
>
> Martin
>
>
>
> >> barry
> >>
> >>
> >> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> >> wrote:
> >>
> >>> This email originated outside the University. Check before clicking links
> >>> or attachments.
> >>>
> >>> I'm really confused. Steve's example wasn't a scalar x, it was a
> >>> vector. Your zapsmall() proposal wouldn't zap it to zero, and I don't
> >>> see why summary() would if it was using your proposal.
> >>>
> >>> Duncan Murdoch
> >>>
> >>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
> >>>> Isn’t that the correct outcome? The user can change the number of
> >>> digits if they want to see small values…
> >>>>
> >>>> --
> >>>> Change your thoughts and you change the world.
> >>>> --Dr. Norman Vincent Peale
> >>>>
> >>>>> On Dec 17, 2023, at 12:11 AM, Steve Martin <stevemartin041 using gmail.com>
> >>> wrote:
> >>>>> Zapping a vector of small numbers to zero would cause problems when
> >>>>> printing the results of summary(). For example, if
> >>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
> >>>>> print(summary(2.220446e-16), digits = 7) would print
> >>>>> Min. 1st Qu. Median Mean 3rd Qu. Max.
> >>>>> 0 0 0 0 0 0
> >>>>>
> >>>>> The same problem can also appear when printing the results of
> >>>>> summary.glm() with show.residuals = TRUE if there's little dispersion
> >>>>> in the residuals.
> >>>>>
> >>>>> Steve
> >>>>>
>>>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg using warnes.net> wrote:
> >>>>>>
>>>>>> I was quite suprised to discover that applying `zapsmall` to a scalar
> >>> value has no apparent effect. For example:
> >>>>>>> y <- 2.220446e-16
> >>>>>>> zapsmall(y,)
>>>>>> [1] 2.2204e-16
> >>>>>>
>>>>>> I was expecting zapsmall(x)` to act like
> >>>>>>
> >>>>>>> round(y, digits=getOption('digits'))
>>>>>> [1] 0
> >>>>>>
>>>>>> Looking at the current source code, indicates that `zapsmall` is
> >>> expecting a vector:
>>>>>> zapsmall <-
>>>>>> function (x, digits = getOption("digits"))
>>>>>> {
>>>>>> if (length(digits) == 0L)
>>>>>> stop("invalid 'digits'")
>>>>>> if (all(ina <- is.na(x)))
>>>>>> return(x)
>>>>>> mx <- max(abs(x[!ina]))
>>>>>> round(x, digits = if (mx > 0) max(0L, digits -
> >>> as.numeric(log10(mx))) else digits)
>>>>>> }
> >>>>>>
>>>>>> If `x` is a non-zero scalar, zapsmall will never perform rounding.
> >>>>>>
>>>>>> The man page simply states:
>>>>>> zapsmall determines a digits argument dr for calling round(x, digits =
> >>> dr) such that values close to zero (compared with the maximal absolute
> >>> value) are ‘zapped’, i.e., replaced by 0.
>>>>>> and doesn’t provide any details about how ‘close to zero’ is defined.
> >>>>>>
>>>>>> Perhaps handling the special when `x` is a scalar (or only contains a
> >>> single non-NA value) would make sense:
>>>>>> zapsmall <-
>>>>>> function (x, digits = getOption("digits"))
>>>>>> {
>>>>>> if (length(digits) == 0L)
>>>>>> stop("invalid 'digits'")
>>>>>> if (all(ina <- is.na(x)))
>>>>>> return(x)
>>>>>> mx <- max(abs(x[!ina]))
>>>>>> round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L,
> >>> digits - as.numeric(log10(mx))) else digits)
>>>>>> }
> >>>>>>
>>>>>> Yielding:
> >>>>>>
> >>>>>>> y <- 2.220446e-16
> >>>>>>> zapsmall(y)
>>>>>> [1] 0
> >>>>>>
>>>>>> Another edge case would be when all of the non-na values are the same:
> >>>>>>
> >>>>>>> y <- 2.220446e-16
> >>>>>>> zapsmall(c(y,y))
>>>>>> [1] 2.220446e-16 2.220446e-16
> >>>>>>
>>>>>> Thoughts?
> >>>>>>
> >>>>>>
>>>>>> Gregory R. Warnes, Ph.D.
>>>>>> greg using warnes.net
>>>>>> Eternity is a long time, take a friend!
> >>>>>>
> >>>>>>
> >>>>>>
>>>>>> [[alternative HTML version deleted]]
> >>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-devel using r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>> ______________________________________________
> >>> R-devel using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> > --
> > Serguei Sokol
> > Ingenieur de recherche INRAE
>
> > Cellule Mathématiques
> > TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
> > 135 Avenue de Rangueil
> > 31077 Toulouse Cedex 04
>
> > tel: +33 5 61 55 98 49
> > email: sokol using insa-toulouse.fr
> > https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/
>
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list