[R] 'Record' row values every time the binary value in acollumn changes

Wed Apr 20 19:30:26 CEST 2011

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of jim holtman
> Sent: Wednesday, April 20, 2011 9:59 AM
> To: baboon2010
> Cc: r-help at r-project.org
> Subject: Re: [R] 'Record' row values every time the binary 
> value in acollumn changes
> 
> Here is an answer to part 1:
> 
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> > # determine where the start/end of each group is
> > # use indices since the size is large
> > startEnd <- lapply(split(seq(nrow(Table))
> +                      , list(Table[, "Chromosome"], Table[, 
> 'binary'])
> +                      , drop = TRUE
> +                      )
> +                   , function(.indx){
> +     se <- range(.indx)
> +     c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
> +       , position_start = unname(Table[se[1L], 'start'])
> +       , position_end = unname(Table[se[2L], 'start'])
> +       , binary2 = unname(Table[se[1L], 'binary'])
> +       )
> + })
> > do.call(rbind, startEnd)
>     Chromosome2 position_start position_end binary2
> 1.0           1             20           36       0
> 2.0           2             17           19       0
> 1.1           1             12           18       1
> 2.1           2             12           16       1

The following will likely be quicker way to find where
a column changes values than that lapply() when there
are lots of rows:

  f1 <- function (Table) {
      isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
      isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
      with(data.frame(Table), {
          first <- isFirstInRun(binary)
          last <- isLastInRun(binary)
          cbind(Chromosome2 = Chromosome[first], position_start = start[first], 
              position_end = start[last], binary2 = binary[first])
    })
  }

E.g.,

  > f1(Table)
       Chromosome2 position_start position_end binary2
  [1,]           1             12           18       1
  [2,]           1             20           36       0
  [3,]           2             12           16       1
  [4,]           2             17           19       0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> >
> >
> 
> 
> On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 
> <nielsvanderaa at live.be> wrote:
> > My question is twofold.
> >
> > Part 1:
> > My data looks like this:
> >
> > (example set, real data has 2*10^6 rows)
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> >      Chromosome start binary
> >  [1,]          1    12      1
> >  [2,]          1    17      1
> >  [3,]          1    18      1
> >  [4,]          1    20      0
> >  [5,]          1    25      0
> >  [6,]          1    36      0
> >  [7,]          2    12      1
> >  [8,]          2    15      1
> >  [9,]          2    16      1
> > [10,]          2    17      0
> > [11,]          2    19      0
> >
> > As output I need a shortlist for each binary block: giving 
> me the starting
> > and ending position of each block.
> > Which for these example would look like this:
> >     Chromosome2 position_start position_end binary2
> > [1,]           1             12           18       1
> > [2,]           1             20           36       0
> > [3,]           2             12           16       1
> > [4,]           2             17           19       0
> >
> > Part 2:
> > Based on the output of part 1, I need to assign the binary 
> to rows of
> > another data set. If the position value in this second data 
> set falls in one
> > of the blocks defined in the shortlist made in part1,the 
> binary value of the
> > shortlist should be assigned to an extra column for this 
> row.  This would
> > look something like this:
> >     Chromosome3 position Value binary3
> >  [1,] "1"         "12"     "a"   "1"
> >  [2,] "1"         "13"     "b"   "1"
> >  [3,] "1"         "14"     "c"   "1"
> >  [4,] "1"         "15"     "d"   "1"
> >  [5,] "1"         "16"     "e"   "1"
> >  [6,] "1"         "18"     "f"   "1"
> >  [7,] "1"         "20"     "g"   "0"
> >  [8,] "1"         "21"     "h"   "0"
> >  [9,] "1"         "22"     "i"   "0"
> > [10,] "1"         "23"     "j"   "0"
> > [11,] "1"         "25"     "k"   "0"
> > [12,] "1"         "35"     "l"   "0"
> > [13,] "2"         "12"     "m"   "1"
> > [14,] "2"         "13"     "n"   "1"
> > [15,] "2"         "14"     "o"   "1"
> > [16,] "2"         "15"     "p"   "1"
> > [17,] "2"         "16"     "q"   "1"
> > [18,] "2"         "17"     "s"   "0"
> > [19,] "2"         "18"     "d"   "0"
> > [20,] "2"         "19"     "f"   "0"
> >
> >
> > Many thanks in advance,
> >
> > Niels
> >
> > --
> > View this message in context: 
> http://r.789695.n4.nabble.com/Record-row-values-every-time-the
-binary-value-in-a-collumn-changes-tp3462496p3462496.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> -- 
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>