[R] Improvement: function cut

Leonard Mada |eo@m@d@ @end|ng |rom @yon|c@eu
Sat Sep 18 00:26:08 CEST 2021


Hello Andrew,


But "cut" generates factors. In most cases with real data one expects to 
have also the ends of the interval: the argument "include.lowest" is 
both ugly and too long.

[The test-code on the ftable thread contains this error! I have run 
through this error a couple of times.]


The only real situation that I can imagine to be problematic:

- if the interval goes to +Inf (or -Inf): I do not know if there would 
be any effects when including +Inf (or -Inf).


Leonard


On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> While it is not explicitly mentioned anywhere in the documentation for 
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep 
> the definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> cbind(
>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get 
> inconsistent behaviour. While this probably wouldn't be an issue with 
> 'real' data, this would seem like something you'd want to avoid by 
> default. The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might 
> be able to get R-core to add argument 'warn', but probably not to 
> change the default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada using syonic.eu 
> <mailto:leo.mada using syonic.eu>> wrote:
>
>     Thank you Andrew.
>
>
>     Is there any reason not to make: include.lowest = TRUE the default?
>
>
>     Regarding the NA:
>
>     The user still has to suspect that some values were not included
>     and run that test.
>
>
>     Leonard
>
>
>     On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>     Regarding your first point, argument 'include.lowest' already
>>     handles this specific case, see ?.bincode
>>
>>     Your second point, maybe it could be helpful, but since both
>>     'cut.default' and '.bincode' return NA if a value isn't within a
>>     bin, you could make something like this on your own.
>>     Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>>     On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>     <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>>
>>         Hello List members,
>>
>>
>>         the following improvements would be useful for function cut
>>         (and .bincode):
>>
>>
>>         1.) Argument: Include extremes
>>         extremes = TRUE
>>         if(right == FALSE) {
>>             # include also right for last interval;
>>         } else {
>>             # include also left for first interval;
>>         }
>>
>>
>>         2.) Argument: warn = TRUE
>>
>>         Warn if any values are not included in the intervals.
>>
>>
>>         Motivation:
>>         - reduce risk of errors when using function cut();
>>
>>
>>         Sincerely,
>>
>>
>>         Leonard
>>
>>         ______________________________________________
>>         R-help using r-project.org <mailto:R-help using r-project.org> mailing
>>         list -- To UNSUBSCRIBE and more, see
>>         https://stat.ethz.ch/mailman/listinfo/r-help
>>         <https://stat.ethz.ch/mailman/listinfo/r-help>
>>         PLEASE do read the posting guide
>>         http://www.R-project.org/posting-guide.html
>>         <http://www.R-project.org/posting-guide.html>
>>         and provide commented, minimal, self-contained, reproducible
>>         code.
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list