[R] Efficient way to find consecutive integers in vector?

Sat Dec 22 12:29:05 CET 2007

>>>>> "TP" == Tony Plate <tplate at acm.org>
>>>>>     on Fri, 21 Dec 2007 18:17:18 -0700 writes:

    TP> Martin Maechler wrote:
    >>>>>>> "MS" == Marc Schwartz <marc_schwartz at comcast.net>
    >>>>>>> on Thu, 20 Dec 2007 16:33:54 -0600 writes:
    >> 
    MS> On Thu, 2007-12-20 at 22:43 +0100, Johannes Graumann wrote:
    >> >> Hi all,
    >> >> 
    >> >> Does anybody have a magic trick handy to isolate directly consecutive
    >> >> integers from something like this:
    >> >> c(1,2,3,4,7,8,9,10,12,13)
    >> >> 
    >> >> The result should be, that groups 1-4, 7-10 and 12-13 are consecutive
    >> >> integers ...
    >> >> 
    >> >> Thanks for any hints, Joh
    >> 
    MS> Not fully tested, but here is one possible approach:
    >> 
    >> >> Vec
    MS> [1]  1  2  3  4  7  8  9 10 12 13
    >> 
    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
    >> 
    >> >> Breaks
    MS> [1]  0  4  8 10
    >> 
    >> >> sapply(seq(length(Breaks) - 1), 
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4
    >> 
    MS> [[2]]
    MS> [1]  7  8  9 10
    >> 
    MS> [[3]]
    MS> [1] 12 13
    >> 
    >> 
    >> 
    MS> For a quick test, I tried it on another vector:
    >> 
    >> 
    MS> set.seed(1)
    MS> Vec <- sort(sample(20, 15))
    >> 
    >> >> Vec
    MS> [1]  1  2  3  4  5  6  8  9 10 11 14 15 16 19 20
    >> 
    MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
    >> 
    >> >> Breaks
    MS> [1]  0  6 10 13 15
    >> 
    >> >> sapply(seq(length(Breaks) - 1), 
    MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
    MS> [[1]]
    MS> [1] 1 2 3 4 5 6
    >> 
    MS> [[2]]
    MS> [1]  8  9 10 11
    >> 
    MS> [[3]]
    MS> [1] 14 15 16
    >> 
    MS> [[4]]
    MS> [1] 19 20
    >> 
    >> Seems ok, but ``only works for increasing sequences''.
    >> More than 12 years ago, I had encountered the same problem and
    >> solved it like this:
    >> 
    >> In package 'sfsmisc', there has been the function  inv.seq(),
    >> named for "inversion of seq()",
    >> which does this too, currently returning an expression,
    >> but returning a call in the development version of sfsmisc:
    >> 
    >> Its definition is currently
    >> 
    >> inv.seq <- function(i) {
    >> ## Purpose: 'Inverse seq': Return a short expression for the 'index'  `i'
    >> ## --------------------------------------------------------------------
    >> ## Arguments: i: vector of (usually increasing) integers.
    >> ## --------------------------------------------------------------------
    >> ## Author: Martin Maechler, Date:  3 Oct 95, 18:08
    >> ## --------------------------------------------------------------------
    >> ## EXAMPLES: cat(rr <- inv.seq(c(3:12, 20:24, 27, 30:33)),"\n"); eval(rr)
    >> ##           r2 <- inv.seq(c(20:13, 3:12, -1:-4, 27, 30:31)); eval(r2); r2
    >> li <- length(i <- as.integer(i))
    >> if(li == 0) return(expression(NULL))
    >> else if(li == 1) return(as.expression(i))
    >> ##-- now have: length(i) >= 2
    >> di1 <- abs(diff(i)) == 1	#-- those are just simple sequences  n1:n2 !
    >> s1 <- i[!c(FALSE,di1)] # beginnings
    >> s2 <- i[!c(di1,FALSE)] # endings
    >> 
    >> ## using text & parse {cheap and dirty} :
    >> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
    >> parse(text =
    >> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
    >> srcfile = NULL)[[1]]
    >> }
    >> 
    >> with example code
    >> 
    >> > v <- c(1:10,11,6,5,4,0,1)
    >> > (iv <- inv.seq(v))
    >> c(1:11, 6:4, 0:1)
    >> > stopifnot(identical(eval(iv), as.integer(v)))
    >> > iv[[2]]
    >> 1:11
    >> > str(iv)
    >> language c(1:11, 6:4, 0:1)
    >> > str(iv[[2]])
    >> language 1:11
    >> > 
    >> 
    >> 
    >> Now, given that this stems from  1995,  I should be excused for
    >> using   parse(text = *)  [see  fortune(106) if you don't understand].
    >> 
    >> However, doing this differently by constructing the resulting
    >> language object directly {using substitute(), as.symbol(),
    >> as.expression() ... etc}
    >> seems not quite trivial.
    >> 
    >> So here's the Friday afternoon /  Christmas break quizz:  
    >> 
    >> What's the most elegant way
    >> to replace the last statements in  inv.seq()
    >> ------------------------------------------------------------------------
    >> ## using text & parse {cheap and dirty} :
    >> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
    >> parse(text =
    >> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
    >> srcfile = NULL)[[1]]
    >> ------------------------------------------------------------------------
    >> 
    >> by code that does not use parse (or source() or similar) ???
    >> 
    >> I don't have an answer yet, at least not at all an elegant one.
    >> And maybe, the solution to the quiz is that there is no elegant
    >> solution.

    TP> How about this ? :

    >> i <- c(1, 10, 12)
    >> j <- c(5, 10, 14)
    >> mkseq <- function(i, j) if (i==j) i else call(':', i, j)
    >> as.call(c(list(as.name('c')), mapply(i, j, FUN=mkseq)))

Excellent, Tony!
That's just about what I had tried to do myself for half an hour
and didn't get around to..

So, I'd say you've clearly won the quiz. 
Congratulations!

If you can think of an appropriate price, please say so.
Otherwise, if we meet at the next useR! conference in
Dortmund.. it will be a beer or something like that..

Martin

    TP> c(1:5, 10, 12:14)
    >> eval(.Last.value)
    TP> [1]  1  2  3  4  5 10 12 13 14
    >> 

    TP> -- Tony Plate