[R] Efficient way to find consecutive integers in vector?

Fri Dec 21 15:54:22 CET 2007

>>>>> "MS" == Marc Schwartz <marc_schwartz at comcast.net>
>>>>>     on Thu, 20 Dec 2007 16:33:54 -0600 writes:

    MS> On Thu, 2007-12-20 at 22:43 +0100, Johannes Graumann wrote:
    >> Hi all,
    >> 
    >> Does anybody have a magic trick handy to isolate directly consecutive
    >> integers from something like this:
    >> c(1,2,3,4,7,8,9,10,12,13)
    >> 
    >> The result should be, that groups 1-4, 7-10 and 12-13 are consecutive
    >> integers ...
    >> 
    >> Thanks for any hints, Joh

    MS> Not fully tested, but here is one possible approach:

    >> Vec
    MS> [1]  1  2  3  4  7  8  9 10 12 13

    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))

    >> Breaks
    MS> [1]  0  4  8 10

    >> sapply(seq(length(Breaks) - 1), 
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4

    MS> [[2]]
    MS> [1]  7  8  9 10

    MS> [[3]]
    MS> [1] 12 13

    MS> For a quick test, I tried it on another vector:

    MS> set.seed(1)
    MS> Vec <- sort(sample(20, 15))

    >> Vec
    MS> [1]  1  2  3  4  5  6  8  9 10 11 14 15 16 19 20

    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))

    >> Breaks
    MS> [1]  0  6 10 13 15

    >> sapply(seq(length(Breaks) - 1), 
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4 5 6

    MS> [[2]]
    MS> [1]  8  9 10 11

    MS> [[3]]
    MS> [1] 14 15 16

    MS> [[4]]
    MS> [1] 19 20

Seems ok, but ``only works for increasing sequences''.
More than 12 years ago, I had encountered the same problem and
solved it like this:

In package 'sfsmisc', there has been the function  inv.seq(),
named for "inversion of seq()",
which does this too, currently returning an expression,
but returning a call in the development version of sfsmisc:

Its definition is currently

inv.seq <- function(i) {
  ## Purpose: 'Inverse seq': Return a short expression for the 'index'  `i'
  ## --------------------------------------------------------------------
  ## Arguments: i: vector of (usually increasing) integers.
  ## --------------------------------------------------------------------
  ## Author: Martin Maechler, Date:  3 Oct 95, 18:08
  ## --------------------------------------------------------------------
  ## EXAMPLES: cat(rr <- inv.seq(c(3:12, 20:24, 27, 30:33)),"\n"); eval(rr)
  ##           r2 <- inv.seq(c(20:13, 3:12, -1:-4, 27, 30:31)); eval(r2); r2
  li <- length(i <- as.integer(i))
  if(li == 0) return(expression(NULL))
  else if(li == 1) return(as.expression(i))
  ##-- now have: length(i) >= 2
  di1 <- abs(diff(i)) == 1	#-- those are just simple sequences  n1:n2 !
  s1 <- i[!c(FALSE,di1)] # beginnings
  s2 <- i[!c(di1,FALSE)] # endings

  ## using text & parse {cheap and dirty} :
  mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
  parse(text =
        paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
        srcfile = NULL)[[1]]
}

with example code

 > v <- c(1:10,11,6,5,4,0,1)
 > (iv <- inv.seq(v))
 c(1:11, 6:4, 0:1)
 > stopifnot(identical(eval(iv), as.integer(v)))
 > iv[[2]]
 1:11
 > str(iv)
  language c(1:11, 6:4, 0:1)
 > str(iv[[2]])
  language 1:11
 > 

Now, given that this stems from  1995,  I should be excused for
using   parse(text = *)  [see  fortune(106) if you don't understand].

However, doing this differently by constructing the resulting
language object directly {using substitute(), as.symbol(),
	 		  as.expression() ... etc}
seems not quite trivial.

So here's the Friday afternoon /  Christmas break quizz:  

  What's the most elegant way
  to replace the last statements in  inv.seq()
  ------------------------------------------------------------------------
  ## using text & parse {cheap and dirty} :
  mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
  parse(text =
        paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
	      srcfile = NULL)[[1]]
  ------------------------------------------------------------------------

  by code that does not use parse (or source() or similar) ???

I don't have an answer yet, at least not at all an elegant one.
And maybe, the solution to the quiz is that there is no elegant
solution.

Martin

    MS> HTH,

    MS> Marc Schwartz