[R] Improvement: function cut

Leonard Mada |eo@m@d@ @end|ng |rom @yon|c@eu
Sat Sep 18 00:44:12 CEST 2021


The warn should be in cut() => .bincode().

It should be generated whenever a real value (excludes NA or NAN or +/- 
Inf) is not included in any of the bins.


If the user writes a script and doesn't want any warnings: he can select 
warn = FALSE. But otherwise it would be very helpful to catch 
immediately the error (and not after a number of steps or miss the error 
altogether).


Leonard


On 9/18/2021 1:28 AM, Jeff Newmiller wrote:
> Re your objection that "the user has to suspect that some values were not included" applies equally to your proposed warn option. There are a lot of ways to introduce NAs... in real projects all analysts should be suspecting this problem.
>
> On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help <r-help using r-project.org> wrote:
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not included and run
>> that test.
>>
>>
>> Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>> Regarding your first point, argument 'include.lowest' already handles
>>> this specific case, see ?.bincode
>>>
>>> Your second point, maybe it could be helpful, but since both
>>> 'cut.default' and '.bincode' return NA if a value isn't within a bin,
>>> you could make something like this on your own.
>>> Might be worth pitching to R-bugs on the wishlist.
>>>
>>>
>>>
>>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>> <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>>>
>>>      Hello List members,
>>>
>>>
>>>      the following improvements would be useful for function cut (and
>>>      .bincode):
>>>
>>>
>>>      1.) Argument: Include extremes
>>>      extremes = TRUE
>>>      if(right == FALSE) {
>>>          # include also right for last interval;
>>>      } else {
>>>          # include also left for first interval;
>>>      }
>>>
>>>
>>>      2.) Argument: warn = TRUE
>>>
>>>      Warn if any values are not included in the intervals.
>>>
>>>
>>>      Motivation:
>>>      - reduce risk of errors when using function cut();
>>>
>>>
>>>      Sincerely,
>>>
>>>
>>>      Leonard
>>>
>>>      ______________________________________________
>>>      R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>>>      To UNSUBSCRIBE and more, see
>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>      <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>      PLEASE do read the posting guide
>>>      http://www.R-project.org/posting-guide.html
>>>      <http://www.R-project.org/posting-guide.html>
>>>      and provide commented, minimal, self-contained, reproducible code.
>>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list