[R] counting sets of consecutive integers in a vector

jim holtman jholtman at gmail.com
Mon Jan 5 02:46:06 CET 2015


Here is a solution using data.table

> require(data.table)
> x <- data.table(v, diff = cumsum(c(1, diff(v)) != 1))
> x
     v diff
 1:  1    0
 2:  2    0
 3:  5    1
 4:  6    1
 5:  7    1
 6:  8    1
 7: 25    2
 8: 30    3
 9: 31    3
10: 32    3
11: 33    3
> x[, list(value = v[1L], length = .N), key = 'diff']
   diff value length
1:    0     1      2
2:    1     5      4
3:    2    25      1
4:    3    30      4
> x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with = FALSE]
# get rid of 'diff' column
   value length
1:     1      2
2:     5      4
3:    25      1
4:    30      4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller <mbmiller+l at gmail.com> wrote:

> I have a vector of sorted positive integer values (e.g., postive integers
> after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
> first value in every run of consecutive integer values, and (2) the
> corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>      [,1] [,2]
> [1,]    1    2
> [2,]    5    4
> [3,]   25    1
> [4,]   30    4
>
> I suppose that works well enough, but there may be a better way, and
> besides, I wouldn't want to deny anyone here the opportunity to solve a fun
> puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file
> to extract data.  seek() gives the starting point and readBin(n=X) gives
> the number of bytes to read.  So when there are many consecutive variables
> to be read, I can multiply the X in n=X by that number instead of doing
> many different seek() calls.  (The data are in a transposed format where I
> read in every record for some variable as sequential elements.)  I'm
> probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list