(Ted Harding)
ted.harding at wlandres.net
Thu Jan 6 23:57:47 CET 2011
On 06-Jan-11 22:16:38, array chip wrote:
> Hi, I am seeking help on designing an algorithm to identify the
> locations of stretches of 1s in a vector of 0s and 1s. Below is
> an simple example:
>
>> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
> ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
>
>> dat
> a b
> 1 0 4
> 2 0 12
> 3 1 13
> 4 1 16
> 5 1 18
> 6 1 20
> 7 0 28
> 8 0 30
> 9 1 34
> 10 1 46
> 11 0 47
> 12 1 49
> 13 1 61
> 14 1 73
> 15 1 77
> 16 0 84
> 17 0 87
> 18 0 90
> 19 0 95
> 20 1 97
>
> In this dataset, "b" is sorted and denotes the location for each
> number in "a".
> So I would like to find the starting & ending locations for each
> stretch of 1s within "a", also counting the number of 1s in each
> stretch as well.
> Hope the results from the algorithm would be:
>
> stretch start end No.of.1s
> 1 13 20 4
> 2 34 46 2
> 3 49 77 4
> 4 97 97 1
>
> I can imagine using for loops can do the job, but I feel it's not a
> clever way to do this. Is there an efficient algorithm that can do
> this fast?
>
> Thanks for any suggestions.
> John
The basic information you need can be got using rle() ("run length
encoding"). See '?rle'. In your example:
rle(dat$a)
# Run Length Encoding
# lengths: int [1:8] 2 4 2 2 1 4 4 1
# values : num [1:8] 0 1 0 1 0 1 0 1
## Note: F -> 0, T -> 1
The following has a somewhat twisted logic at the end, and may
be flawed, but you can probably adapt it!
L <- rle(dat$a)$lengths
V <- rle(dat$a)$values
pos <- c(1,cumsum(L))
V1 <- c(-1,V)
1+pos[V1==0]
# [1] 3 9 12 20
## Positions in the series dat$a where each run of "T" (i.e. 1)
## starts
Hoping this helps,
Ted.
