[R] counting sets of consecutive integers in a vector

Mon Jan 5 04:53:16 CET 2015

Thanks, Peter.  Why not cbind your idea for the first column with my idea 
for the second column and get it done in one line?:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths )
M

      [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I find that pretty appealing and I'll probably stick with it.  It seems 
quite fast.  Here's an example:

# make fairly long vector
v <- sort(unique(round(100000*runif(100000))))
length(v)
[1] 63274

# time the procedure:
ptm <- proc.time() ; M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths ) ; proc.time() - ptm
    user  system elapsed
    0.03    0.00    0.03

dim(M)
[1] 23212     2

I probably won't be using vectors any longer than that, and this isn't the 
kind of thing that I do over and over again, so that speed is excellent.

Mike

On Mon, 5 Jan 2015, Peter Alspach wrote:

> Tena koe Mike
>
> An alternative, which is slightly fast:
>
>  diffv <- diff(v)
>  starts <- c(1, which(diffv!=1)+1)
>  cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))
>
> Peter Alspach
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mike Miller
> Sent: Monday, 5 January 2015 1:03 p.m.
> To: R-Help List
> Subject: [R] counting sets of consecutive integers in a vector
>
> I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>      [,1] [,2]
> [1,]    1    2
> [2,]    5    4
> [3,]   25    1
> [4,]   30    4
>
> I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file to extract data.  seek() gives the starting point and readBin(n=X) gives the number of bytes to read.  So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls.  (The data are in a transposed format where I read in every record for some variable as sequential elements.)  I'm probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be subject to legal privilege.
> If you are not the intended recipient you must not use, disseminate, distribute or
> reproduce all or any part of this e-mail or attachments.  If you have received this
> e-mail in error, please notify the sender and delete all material pertaining to this
> e-mail.  Any opinion or views expressed in this e-mail are those of the individual
> sender and may not represent those of The New Zealand Institute for Plant and
> Food Research Limited.
>