[R] 'Record' row values every time the binary value in acollumn changes
William Dunlap
wdunlap at tibco.com
Wed Apr 20 19:30:26 CEST 2011
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of jim holtman
> Sent: Wednesday, April 20, 2011 9:59 AM
> To: baboon2010
> Cc: r-help at r-project.org
> Subject: Re: [R] 'Record' row values every time the binary
> value in acollumn changes
>
> Here is an answer to part 1:
>
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> > # determine where the start/end of each group is
> > # use indices since the size is large
> > startEnd <- lapply(split(seq(nrow(Table))
> + , list(Table[, "Chromosome"], Table[,
> 'binary'])
> + , drop = TRUE
> + )
> + , function(.indx){
> + se <- range(.indx)
> + c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
> + , position_start = unname(Table[se[1L], 'start'])
> + , position_end = unname(Table[se[2L], 'start'])
> + , binary2 = unname(Table[se[1L], 'binary'])
> + )
> + })
> > do.call(rbind, startEnd)
> Chromosome2 position_start position_end binary2
> 1.0 1 20 36 0
> 2.0 2 17 19 0
> 1.1 1 12 18 1
> 2.1 2 12 16 1
The following will likely be quicker way to find where
a column changes values than that lapply() when there
are lots of rows:
f1 <- function (Table) {
isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
with(data.frame(Table), {
first <- isFirstInRun(binary)
last <- isLastInRun(binary)
cbind(Chromosome2 = Chromosome[first], position_start = start[first],
position_end = start[last], binary2 = binary[first])
})
}
E.g.,
> f1(Table)
Chromosome2 position_start position_end binary2
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> >
> >
>
>
> On Wed, Apr 20, 2011 at 5:01 AM, baboon2010
> <nielsvanderaa at live.be> wrote:
> > My question is twofold.
> >
> > Part 1:
> > My data looks like this:
> >
> > (example set, real data has 2*10^6 rows)
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> > Chromosome start binary
> > [1,] 1 12 1
> > [2,] 1 17 1
> > [3,] 1 18 1
> > [4,] 1 20 0
> > [5,] 1 25 0
> > [6,] 1 36 0
> > [7,] 2 12 1
> > [8,] 2 15 1
> > [9,] 2 16 1
> > [10,] 2 17 0
> > [11,] 2 19 0
> >
> > As output I need a shortlist for each binary block: giving
> me the starting
> > and ending position of each block.
> > Which for these example would look like this:
> > Chromosome2 position_start position_end binary2
> > [1,] 1 12 18 1
> > [2,] 1 20 36 0
> > [3,] 2 12 16 1
> > [4,] 2 17 19 0
> >
> > Part 2:
> > Based on the output of part 1, I need to assign the binary
> to rows of
> > another data set. If the position value in this second data
> set falls in one
> > of the blocks defined in the shortlist made in part1,the
> binary value of the
> > shortlist should be assigned to an extra column for this
> row. This would
> > look something like this:
> > Chromosome3 position Value binary3
> > [1,] "1" "12" "a" "1"
> > [2,] "1" "13" "b" "1"
> > [3,] "1" "14" "c" "1"
> > [4,] "1" "15" "d" "1"
> > [5,] "1" "16" "e" "1"
> > [6,] "1" "18" "f" "1"
> > [7,] "1" "20" "g" "0"
> > [8,] "1" "21" "h" "0"
> > [9,] "1" "22" "i" "0"
> > [10,] "1" "23" "j" "0"
> > [11,] "1" "25" "k" "0"
> > [12,] "1" "35" "l" "0"
> > [13,] "2" "12" "m" "1"
> > [14,] "2" "13" "n" "1"
> > [15,] "2" "14" "o" "1"
> > [16,] "2" "15" "p" "1"
> > [17,] "2" "16" "q" "1"
> > [18,] "2" "17" "s" "0"
> > [19,] "2" "18" "d" "0"
> > [20,] "2" "19" "f" "0"
> >
> >
> > Many thanks in advance,
> >
> > Niels
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Record-row-values-every-time-the
-binary-value-in-a-collumn-changes-tp3462496p3462496.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list