[R] Compressing a sequence

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Feb 22 20:28:40 CET 2025


Hello,

Inline.

Às 18:47 de 22/02/2025, avi.e.gross using gmail.com escreveu:
> Rui,
> 
> Your post (way at the bottom) expands beyond what was requested and that makes it more interesting.
> 
> I like to look at what I consider reusable ideas and tools.
> 
> One was the reminder of using apply across rows of a matrix which would have applied to my earlier post of a package that also created a matrix of sequential followed by a count.
> 
> Another was the idea of creating a changed object that now has an additional class that can determine which new print function to use automagically. I do wonder what happens if there are other classes that also have such a function. Does the first class in the listing dominate?

Yes, to call the 2nd, use NextMethod().

Continues at the end.

> 
> Finally, I have been considering the broader question of how to create a more abstract function along the lines being discussed.
> 
> This request was about integers and the definition of being next but contiguous was to be one apart. Clearly, there can be cases where you want say just contiguous even (or odd) numbers so that c(2, 6,8,10) becomes "2, 6-10" as in street addresses for deliveries on one side of a street where all addresses on that side are even. That could be handled by not hard-coding in a difference of 1, but providing a successor() function that returns TRUE/FALSE when supplied with two arguments. One method is to allow such a function as an argument.
> 
> A related idea would be to allow other entries where a successor has meaning. Consider c("a", "c", "d", "e", "x", "y") which you can imagine collapsed as "a, c-e, x-y" as one example. But you can broaden it to work with something like Roman numerals in text form. Having a successor function, as above, could allow such additional functionality.
> 
> And, of course, other options could be allowed, perhaps to just pass through, so that you can specify the sep(arator) or col(lapse) strings such as instead of this:
> 
> "1, 3-6, 12-19"
> 
> You might see:
> 
> 1
> 3:6
> 12:19
> 
> As in multiple lines and colon separated.
> 
> But one question I keep asking is about complexity. We have seen a number of solutions and some of them require many passes over what could be a long vector and other data structures like matrices with quite a few passes. It is nice to re-use bits and pieces but perhaps for longer examples, not in prototyping mode, writing a custom function that passes minimally over the data, may be a better choice.
> 
> 
> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
> Sent: Saturday, February 22, 2025 7:36 AM
> To: Dennis Fisher <fisher using plessthan.com>; r-help <r-help using r-project.org>
> Subject: Re: [R] Compressing a sequence
> 
> Às 00:46 de 22/02/2025, Dennis Fisher escreveu:
>> R 4.4.0
>> OS X
>>
>> Colleagues
>>
>> I have a sequence like:
>> 	1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20
>>
>> I would like to display it as:
>> 	1, 3-5, 7-8, 12-15, 20
>>
>> Any simple ways to accomplish this?
>>
>> Dennis
>>
>>
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
>> www.PLessThan.com
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> Here is a way with package R.utils, function seqToIntervals.
> 
> 
> x <- scan(text = "1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20", sep = ",")
> 
> mat <- R.utils::seqToIntervals(x)
> apply(mat, 1L, \(m) {
>     ifelse(m[1L] == m[2L], m[1L], paste(m, collapse = "-"))
> })
> #> [1] "1"     "3-5"   "7-8"   "12-15" "20"
> 
> 
> If you want to be fancy, define a special class that prints like that.
> 
> 
> 
> x <- scan(text = "1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20", sep = ",")
> 
> as_seqInterval <- function(x) {
>     old_class <- class(x)
>     class(x) <- c("seqInterval", old_class)
>     x
> }
> print.seqInterval <- function(x, ...) {
>     mat <- R.utils::seqToIntervals(x)
>     out <- apply(mat, 1L, \(m) {
>       ifelse(m[1L] == m[2L], m[1L], paste(m, collapse = "-"))
>     })
>     print(out)
> }
> 
> y <- as_seqInterval(x)
> class(y)
> #> [1] "seqInterval" "numeric"
> 
> # autoprinting y
> y
> #> [1] "1"     "3-5"   "7-8"   "12-15" "20"
> 
> # explicit printing y
> print(y)
> #> [1] "1"     "3-5"   "7-8"   "12-15" "20"
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 

With the example from my first post, there is another advantage in using 
a new class. Since this new class sub-classes "numeric" "seqInterval" 
numbers can be added, subtracted, etc, they will keep the expected 
behaviour of numbers.



x2 <- c(1:3, 5, 6)
length(x)
#> [1] 11
length(x2)
#> [1] 5

y2 <- as_seqInterval(x2)
y2
#> [1] "1-3" "5-6"
x + x2
#> Warning in x + x2: longer object length is not a multiple of shorter 
object
#> length
#>  [1]  2  5  7 10 13  9 14 16 19 21 21

# make it more obvious that 9, 10 and 13, 14 should be compressed
sort(x + x2)
#> Warning in x + x2: longer object length is not a multiple of shorter 
object
#> length
#>  [1]  2  5  7  9 10 13 14 16 19 21 21

# to add two objects of class "seqInterval" keeps the compressing property
as_seqInterval(x + x2)
#> Warning in x + x2: longer object length is not a multiple of shorter 
object
#> length
#> [1] "2"     "5"     "7"     "9-10"  "13-14" "16"    "19"    "21"
y + y2
#> Warning in y + y2: longer object length is not a multiple of shorter 
object
#> length
#> [1] "2"     "5"     "7"     "9-10"  "13-14" "16"    "19"    "21"

# now a division, the results are not integer
y/2
#> [1] "0-4" "6-7" "10"
unclass(y/2)
#> [1]  0.5  1.5  2.0  2.5  3.5  4.0  6.0  6.5  7.0  7.5 10.0



Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com



More information about the R-help mailing list