[Rd] [External] Re: zapsmall(x) for scalar x

Serguei Sokol @oko| @end|ng |rom |n@@-tou|ou@e@|r
Mon Dec 18 10:29:02 CET 2023


Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> I think what's been missed is that zapsmall works relative to the absolute
> largest value in the vector. Hence if there's only one
> item in the vector, it is the largest, so its not zapped. The function's
> raison d'etre isn't to replace absolutely small values,
> but small values relative to the largest. Hence a vector of similar tiny
> values doesn't get zapped.
>
> Maybe the line in the docs:
>
> " (compared with the maximal absolute value)"
>
> needs to read:
>
> " (compared with the maximal absolute value in the vector)"
I agree that this change in the doc would clarify the situation but 
would not resolve proposed corner cases.
I think that an additional argument 'mx' (absolute max value of 
reference) would do. Consider:

zapsmall2 <-
function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
{
     if (length(digits) == 0L)
         stop("invalid 'digits'")
     if (all(ina <- is.na(x)))
         return(x)
     round(x, digits = if (mx > 0) max(0L, digits - 
as.numeric(log10(mx))) else digits)
}

then zapsmall2() without explicit 'mx' behaves identically to actual 
zapsmall() and for a scalar or a vector of identical value, user can 
manually fix the scale of what should be considered as small:

 > zapsmall2(y)
[1] 2.220446e-16
 > zapsmall2(y, mx=1)
[1] 0
 > zapsmall2(c(y, y), mx=1)
[1] 0 0
 > zapsmall2(c(y, NA))
[1] 2.220446e-16           NA
 > zapsmall2(c(y, NA), mx=1)
[1]  0 NA

Obviously, the name 'zapsmall2' was chosen just for this explanation. 
The original name 'zapsmall' could be reused as a full backward 
compatibility is preserved.

Best,
Serguei.

>
> Barry
>
>
>
>
>
> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
>
>> This email originated outside the University. Check before clicking links
>> or attachments.
>>
>> I'm really confused.  Steve's example wasn't a scalar x, it was a
>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
>> see why summary() would if it was using your proposal.
>>
>> Duncan Murdoch
>>
>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
>>> Isn’t that the correct outcome?  The user can change the number of
>> digits if they want to see small values…
>>>
>>> --
>>> Change your thoughts and you change the world.
>>> --Dr. Norman Vincent Peale
>>>
>>>> On Dec 17, 2023, at 12:11 AM, Steve Martin <stevemartin041 using gmail.com>
>> wrote:
>>>> Zapping a vector of small numbers to zero would cause problems when
>>>> printing the results of summary(). For example, if
>>>> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
>>>> print(summary(2.220446e-16), digits = 7) would print
>>>>     Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>          0          0            0           0           0          0
>>>>
>>>> The same problem can also appear when printing the results of
>>>> summary.glm() with show.residuals = TRUE if there's little dispersion
>>>> in the residuals.
>>>>
>>>> Steve
>>>>
>>>>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes <greg using warnes.net> wrote:
>>>>>
>>>>> I was quite suprised to discover that applying `zapsmall` to a scalar
>> value has no apparent effect.  For example:
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(y,)
>>>>> [1] 2.2204e-16
>>>>>
>>>>> I was expecting zapsmall(x)` to act like
>>>>>
>>>>>> round(y, digits=getOption('digits'))
>>>>> [1] 0
>>>>>
>>>>> Looking at the current source code, indicates that `zapsmall` is
>> expecting a vector:
>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0) max(0L, digits -
>> as.numeric(log10(mx))) else digits)
>>>>> }
>>>>>
>>>>> If `x` is a non-zero scalar, zapsmall will never perform rounding.
>>>>>
>>>>> The man page simply states:
>>>>> zapsmall determines a digits argument dr for calling round(x, digits =
>> dr) such that values close to zero (compared with the maximal absolute
>> value) are ‘zapped’, i.e., replaced by 0.
>>>>> and doesn’t provide any details about how ‘close to zero’ is defined.
>>>>>
>>>>> Perhaps handling the special when `x` is a scalar (or only contains a
>> single non-NA value)  would make sense:
>>>>> zapsmall <-
>>>>> function (x, digits = getOption("digits"))
>>>>> {
>>>>>      if (length(digits) == 0L)
>>>>>          stop("invalid 'digits'")
>>>>>      if (all(ina <- is.na(x)))
>>>>>          return(x)
>>>>>      mx <- max(abs(x[!ina]))
>>>>>      round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L,
>> digits - as.numeric(log10(mx))) else digits)
>>>>> }
>>>>>
>>>>> Yielding:
>>>>>
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(y)
>>>>> [1] 0
>>>>>
>>>>> Another edge case would be when all of the non-na values are the same:
>>>>>
>>>>>> y <- 2.220446e-16
>>>>>> zapsmall(c(y,y))
>>>>> [1] 2.220446e-16 2.220446e-16
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>> Gregory R. Warnes, Ph.D.
>>>>> greg using warnes.net
>>>>> Eternity is a long time, take a friend!
>>>>>
>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Serguei Sokol
Ingenieur de recherche INRAE

Cellule Mathématiques
TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 61 55 98 49
email: sokol using insa-toulouse.fr
https://www.toulouse-biotechnology-institute.fr/en/plateformes-plateaux/cellule-mathematiques/



More information about the R-devel mailing list