[R] Compressing a sequence

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sat Feb 22 23:23:22 CET 2025


Hi Ben:

I realize that for the OP whether it takes 1/2 second or 1 microsecond to
do what he wants may be irrelevant, but just for fun I thought I'd time the
condense() function you found vs. the compr() function I worked out, which
are similar in their approach.

compr <- function(x, sep ="-")
{
   left<- c(TRUE, diff(x) != 1)
   right <- c(rev(diff(rev(x)) != -1 ), TRUE)
   xleft <- x[left]
   xright <- x[right]
   ifelse(xleft == xright,
            xleft,
            paste(xleft, xright, sep = sep))
}
Results were:

set.seed(4456)
x <- sort(sample(seq_len(1000), 800))
all(compr(x) == condense(x))  ##TRUE

library(microbenchmark)
> microbenchmark(compr(x), condense(x), times = 500)
Unit: microseconds
        expr      min       lq       mean    median        uq      max neval
    compr(x)   65.067   67.773   70.45071   69.0235   71.1145  113.857   500
 condense(x) 1022.089 1031.334 1116.02812 1036.2135 1049.8050 5249.271   500

As usual, ymmv, but the difference is due to vectorized and unvectorized
(tapply) computation, of course.

Cheers,
Bert

"An educated person is one who can entertain new ideas, entertain others,
and entertain herself."



On Fri, Feb 21, 2025 at 6:27 PM Ben Bolker <bbolker using gmail.com> wrote:

>    And some more from 2013:
>
> https://stackoverflow.com/questions/14868406/collapse-continuous-integer-runs-to-strings-of-ranges
>
>    Can be as short as:
>
> condense <- function(x)
>    unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y)
>      paste(unique(range(y)), collapse = "-")
>    ))
>
> z <- c(1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20)
>
> condense(z) |> paste(collapse = ", ")
>
> "1, 3-5, 7-8, 12-15, 20"
>
> On 2025-02-21 9:16 p.m., Ben Bolker wrote:
> >    There are some answers from 2016 here:
> >
> > https://stackoverflow.com/questions/34636461/collapse-consecutive-runs-
> > of-numbers-to-a-string-of-ranges
> >
> > On 2025-02-21 7:59 p.m., Steven Ellis wrote:
> >> Hi Dennis,
> >>
> >> A quick Claude request:
> >>
> >> "using r I have a sequence like:        1, 3, 4, 5, 7, 8, 12, 13, 14,
> 15,
> >> 20I would like to display it as:        1, 3-5, 7-8, 12-15, 20"
> >>
> >> yielded:
> >>
> >> condense_sequence <- function(nums) {
> >>    if (length(nums) == 0) return("")
> >>    if (length(nums) == 1) return(as.character(nums))
> >>
> >>    # Sort the numbers just in case they're not in order
> >>    nums <- sort(unique(nums))
> >>
> >>    # Initialize variables
> >>    ranges <- vector("character")
> >>    start <- nums[1]
> >>    prev <- nums[1]
> >>
> >>    for (i in 2:length(nums)) {
> >>      if (nums[i] != prev + 1) {
> >>        # End of a sequence
> >>        if (start == prev) {
> >>          ranges <- c(ranges, as.character(start))
> >>        } else {
> >>          ranges <- c(ranges, paste(start, prev, sep="-"))
> >>        }
> >>        start <- nums[i]
> >>      }
> >>      prev <- nums[i]
> >>    }
> >>
> >>    # Handle the last number or range
> >>    if (start == prev) {
> >>      ranges <- c(ranges, as.character(start))
> >>    } else {
> >>      ranges <- c(ranges, paste(start, prev, sep="-"))
> >>    }
> >>
> >>    # Join all ranges with commas
> >>    paste(ranges, collapse=", ")
> >> }
> >>
> >> # Your sequence
> >> nums <- c(1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20)
> >>
> >> # Apply the function
> >> result <- condense_sequence(nums)
> >> print(result)
> >> # Output: "1, 3-5, 7-8, 12-15, 20"
> >>
> >> Which appears to work well, though you may have other thoughts in mind /
> >> edge cases this code does not cover.
> >>
> >> Best,
> >> Steven
> >>
> >> On Fri, Feb 21, 2025 at 7:47 PM Dennis Fisher <fisher using plessthan.com>
> >> wrote:
> >>
> >>> R 4.4.0
> >>> OS X
> >>>
> >>> Colleagues
> >>>
> >>> I have a sequence like:
> >>>          1, 3, 4, 5, 7, 8, 12, 13, 14, 15, 20
> >>>
> >>> I would like to display it as:
> >>>          1, 3-5, 7-8, 12-15, 20
> >>>
> >>> Any simple ways to accomplish this?
> >>>
> >>> Dennis
> >>>
> >>>
> >>> Dennis Fisher MD
> >>> P < (The "P Less Than" Company)
> >>> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
> >>> www.PLessThan.com
> >>>
> >>>
> >>>          [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> https://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide https://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list