[R] algorithm help
William Dunlap
wdunlap at tibco.com
Fri Jan 7 00:52:47 CET 2011
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of array chip
> Sent: Thursday, January 06, 2011 3:29 PM
> To: ted.harding at wlandres.net
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] algorithm help
>
> Thanks very much, Ted. Yes, it does what I need!
>
> I made a routine to do this:
>
> f.fragment<-function(a,b) {
> dat<-as.data.frame(cbind(a,b))
>
> L <- rle(dat$a)$lengths
> V <- rle(dat$a)$values
> pos <- c(1,cumsum(L))
> V1 <- c(-1,V)
> start<-1+pos[V1==0]
> end<-pos[V1==1]
>
> cbind(stretch=1:length(start),start=dat$b[start]
> ,end=dat$b[end],no.of.1s=L[V==1])
>
> }
>
> f.fragment(dat$a,dat$b)
>
> stretch start end no.of.1s
> [1,] 1 13 20 4
> [2,] 2 34 46 2
> [3,] 3 49 77 4
> [4,] 4 97 97 1
You need to be more careful about the first
and last rows in the dataset. I think yours
only works when a starts with 0 and ends with 1.
> f.fragment(c(1,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
[1,] 1 NA 12 2
> f.fragment(c(1,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
[1,] 1 14 12 2
[2,] 1 14 14 1
> f.fragment(c(0,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
[1,] 1 12 12 1
[2,] 2 14 14 1
> f.fragment(c(0,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
[1,] 1 12 12 1
[2,] 2 NA 12 1
> f.fragment(c(1,1,1,1), c(11,12,13,14))
stretch end no.of.1s
[1,] 1 14 4
[2,] 0 14 4
> f.fragment(c(0,0,0,0), c(11,12,13,14))
stretch start
[1,] 1 NA
The following does better. It keeps things as
logical vectors as long as possible, which tends
to work better when dealing with runs.
f <- function(a, b) {
isFirstIn1Run <- c(TRUE, a[-1] != a[-length(a)]) & a==1
isLastIn1Run <- c(a[-1] != a[-length(a)], TRUE) & a==1
data.frame(stretch=seq_len(sum(isFirstIn1Run)),
start = b[isFirstIn1Run],
end = b[isLastIn1Run],
no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run)
+ 1)
}
> f(c(1,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
1 1 11 12 2
> f(c(1,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
1 1 11 12 2
2 2 14 14 1
> f(c(0,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
1 1 12 12 1
2 2 14 14 1
> f(c(0,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
1 1 12 12 1
> f(c(1,1,1,1), c(11,12,13,14))
stretch start end no.of.1s
1 1 11 14 4
> f(c(0,0,0,0), c(11,12,13,14))
[1] stretch start end no.of.1s
<0 rows> (or 0-length row.names)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> John
>
>
>
>
> ________________________________
> From: "ted.harding at wlandres.net" <ted.harding at wlandres.net>
>
> Cc: r-help at stat.math.ethz.ch
> Sent: Thu, January 6, 2011 2:57:47 PM
> Subject: RE: [R] algorithm help
>
> On 06-Jan-11 22:16:38, array chip wrote:
> > Hi, I am seeking help on designing an algorithm to identify the
> > locations of stretches of 1s in a vector of 0s and 1s. Below is
> > an simple example:
> >
> >>
> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
> > ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> >
> >> dat
> > a b
> > 1 0 4
> > 2 0 12
> > 3 1 13
> > 4 1 16
> > 5 1 18
> > 6 1 20
> > 7 0 28
> > 8 0 30
> > 9 1 34
> > 10 1 46
> > 11 0 47
> > 12 1 49
> > 13 1 61
> > 14 1 73
> > 15 1 77
> > 16 0 84
> > 17 0 87
> > 18 0 90
> > 19 0 95
> > 20 1 97
> >
> > In this dataset, "b" is sorted and denotes the location for each
> > number in "a".
> > So I would like to find the starting & ending locations for each
> > stretch of 1s within "a", also counting the number of 1s in each
> > stretch as well.
> > Hope the results from the algorithm would be:
> >
> > stretch start end No.of.1s
> > 1 13 20 4
> > 2 34 46 2
> > 3 49 77 4
> > 4 97 97 1
> >
> > I can imagine using for loops can do the job, but I feel it's not a
> > clever way to do this. Is there an efficient algorithm that can do
> > this fast?
> >
> > Thanks for any suggestions.
> > John
>
> The basic information you need can be got using rle() ("run length
> encoding"). See '?rle'. In your example:
>
> rle(dat$a)
> # Run Length Encoding
> # lengths: int [1:8] 2 4 2 2 1 4 4 1
> # values : num [1:8] 0 1 0 1 0 1 0 1
> ## Note: F -> 0, T -> 1
>
> The following has a somewhat twisted logic at the end, and may
> [[elided Yahoo spam]]
>
> L <- rle(dat$a)$lengths
> V <- rle(dat$a)$values
> pos <- c(1,cumsum(L))
> V1 <- c(-1,V)
> 1+pos[V1==0]
> # [1] 3 9 12 20
> ## Positions in the series dat$a where each run of "T" (i.e. 1)
> ## starts
>
> Hoping this helps,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 06-Jan-11 Time: 22:57:44
> ------------------------------ XFMail ------------------------------
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list