[R] Efficient way to find consecutive integers in vector?
Martin Maechler
maechler at stat.math.ethz.ch
Sat Dec 22 12:29:05 CET 2007
>>>>> "TP" == Tony Plate <tplate at acm.org>
>>>>> on Fri, 21 Dec 2007 18:17:18 -0700 writes:
TP> Martin Maechler wrote:
>>>>>>> "MS" == Marc Schwartz <marc_schwartz at comcast.net>
>>>>>>> on Thu, 20 Dec 2007 16:33:54 -0600 writes:
>>
MS> On Thu, 2007-12-20 at 22:43 +0100, Johannes Graumann wrote:
>> >> Hi all,
>> >>
>> >> Does anybody have a magic trick handy to isolate directly consecutive
>> >> integers from something like this:
>> >> c(1,2,3,4,7,8,9,10,12,13)
>> >>
>> >> The result should be, that groups 1-4, 7-10 and 12-13 are consecutive
>> >> integers ...
>> >>
>> >> Thanks for any hints, Joh
>>
MS> Not fully tested, but here is one possible approach:
>>
>> >> Vec
MS> [1] 1 2 3 4 7 8 9 10 12 13
>>
MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
>>
>> >> Breaks
MS> [1] 0 4 8 10
>>
>> >> sapply(seq(length(Breaks) - 1),
MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
MS> [[1]]
MS> [1] 1 2 3 4
>>
MS> [[2]]
MS> [1] 7 8 9 10
>>
MS> [[3]]
MS> [1] 12 13
>>
>>
>>
MS> For a quick test, I tried it on another vector:
>>
>>
MS> set.seed(1)
MS> Vec <- sort(sample(20, 15))
>>
>> >> Vec
MS> [1] 1 2 3 4 5 6 8 9 10 11 14 15 16 19 20
>>
MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
>>
>> >> Breaks
MS> [1] 0 6 10 13 15
>>
>> >> sapply(seq(length(Breaks) - 1),
MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
MS> [[1]]
MS> [1] 1 2 3 4 5 6
>>
MS> [[2]]
MS> [1] 8 9 10 11
>>
MS> [[3]]
MS> [1] 14 15 16
>>
MS> [[4]]
MS> [1] 19 20
>>
>> Seems ok, but ``only works for increasing sequences''.
>> More than 12 years ago, I had encountered the same problem and
>> solved it like this:
>>
>> In package 'sfsmisc', there has been the function inv.seq(),
>> named for "inversion of seq()",
>> which does this too, currently returning an expression,
>> but returning a call in the development version of sfsmisc:
>>
>> Its definition is currently
>>
>> inv.seq <- function(i) {
>> ## Purpose: 'Inverse seq': Return a short expression for the 'index' `i'
>> ## --------------------------------------------------------------------
>> ## Arguments: i: vector of (usually increasing) integers.
>> ## --------------------------------------------------------------------
>> ## Author: Martin Maechler, Date: 3 Oct 95, 18:08
>> ## --------------------------------------------------------------------
>> ## EXAMPLES: cat(rr <- inv.seq(c(3:12, 20:24, 27, 30:33)),"\n"); eval(rr)
>> ## r2 <- inv.seq(c(20:13, 3:12, -1:-4, 27, 30:31)); eval(r2); r2
>> li <- length(i <- as.integer(i))
>> if(li == 0) return(expression(NULL))
>> else if(li == 1) return(as.expression(i))
>> ##-- now have: length(i) >= 2
>> di1 <- abs(diff(i)) == 1 #-- those are just simple sequences n1:n2 !
>> s1 <- i[!c(FALSE,di1)] # beginnings
>> s2 <- i[!c(di1,FALSE)] # endings
>>
>> ## using text & parse {cheap and dirty} :
>> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>> parse(text =
>> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
>> srcfile = NULL)[[1]]
>> }
>>
>> with example code
>>
>> > v <- c(1:10,11,6,5,4,0,1)
>> > (iv <- inv.seq(v))
>> c(1:11, 6:4, 0:1)
>> > stopifnot(identical(eval(iv), as.integer(v)))
>> > iv[[2]]
>> 1:11
>> > str(iv)
>> language c(1:11, 6:4, 0:1)
>> > str(iv[[2]])
>> language 1:11
>> >
>>
>>
>> Now, given that this stems from 1995, I should be excused for
>> using parse(text = *) [see fortune(106) if you don't understand].
>>
>> However, doing this differently by constructing the resulting
>> language object directly {using substitute(), as.symbol(),
>> as.expression() ... etc}
>> seems not quite trivial.
>>
>> So here's the Friday afternoon / Christmas break quizz:
>>
>> What's the most elegant way
>> to replace the last statements in inv.seq()
>> ------------------------------------------------------------------------
>> ## using text & parse {cheap and dirty} :
>> mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>> parse(text =
>> paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
>> srcfile = NULL)[[1]]
>> ------------------------------------------------------------------------
>>
>> by code that does not use parse (or source() or similar) ???
>>
>> I don't have an answer yet, at least not at all an elegant one.
>> And maybe, the solution to the quiz is that there is no elegant
>> solution.
TP> How about this ? :
>> i <- c(1, 10, 12)
>> j <- c(5, 10, 14)
>> mkseq <- function(i, j) if (i==j) i else call(':', i, j)
>> as.call(c(list(as.name('c')), mapply(i, j, FUN=mkseq)))
Excellent, Tony!
That's just about what I had tried to do myself for half an hour
and didn't get around to..
So, I'd say you've clearly won the quiz.
Congratulations!
If you can think of an appropriate price, please say so.
Otherwise, if we meet at the next useR! conference in
Dortmund.. it will be a beer or something like that..
Martin
TP> c(1:5, 10, 12:14)
>> eval(.Last.value)
TP> [1] 1 2 3 4 5 10 12 13 14
>>
TP> -- Tony Plate
More information about the R-help
mailing list