[R] 'Record' row values every time the binary value in a collumn changes
jim holtman
jholtman at gmail.com
Wed Apr 20 18:59:27 CEST 2011
Here is an answer to part 1:
> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> start<-c(12,17,18,20,25,36,12,15,16,17,19)
> Table<-cbind(Chromosome,start,binary)
> # determine where the start/end of each group is
> # use indices since the size is large
> startEnd <- lapply(split(seq(nrow(Table))
+ , list(Table[, "Chromosome"], Table[, 'binary'])
+ , drop = TRUE
+ )
+ , function(.indx){
+ se <- range(.indx)
+ c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
+ , position_start = unname(Table[se[1L], 'start'])
+ , position_end = unname(Table[se[2L], 'start'])
+ , binary2 = unname(Table[se[1L], 'binary'])
+ )
+ })
> do.call(rbind, startEnd)
Chromosome2 position_start position_end binary2
1.0 1 20 36 0
2.0 2 17 19 0
1.1 1 12 18 1
2.1 2 12 16 1
>
>
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
> My question is twofold.
>
> Part 1:
> My data looks like this:
>
> (example set, real data has 2*10^6 rows)
> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> start<-c(12,17,18,20,25,36,12,15,16,17,19)
> Table<-cbind(Chromosome,start,binary)
> Chromosome start binary
> [1,] 1 12 1
> [2,] 1 17 1
> [3,] 1 18 1
> [4,] 1 20 0
> [5,] 1 25 0
> [6,] 1 36 0
> [7,] 2 12 1
> [8,] 2 15 1
> [9,] 2 16 1
> [10,] 2 17 0
> [11,] 2 19 0
>
> As output I need a shortlist for each binary block: giving me the starting
> and ending position of each block.
> Which for these example would look like this:
> Chromosome2 position_start position_end binary2
> [1,] 1 12 18 1
> [2,] 1 20 36 0
> [3,] 2 12 16 1
> [4,] 2 17 19 0
>
> Part 2:
> Based on the output of part 1, I need to assign the binary to rows of
> another data set. If the position value in this second data set falls in one
> of the blocks defined in the shortlist made in part1,the binary value of the
> shortlist should be assigned to an extra column for this row. This would
> look something like this:
> Chromosome3 position Value binary3
> [1,] "1" "12" "a" "1"
> [2,] "1" "13" "b" "1"
> [3,] "1" "14" "c" "1"
> [4,] "1" "15" "d" "1"
> [5,] "1" "16" "e" "1"
> [6,] "1" "18" "f" "1"
> [7,] "1" "20" "g" "0"
> [8,] "1" "21" "h" "0"
> [9,] "1" "22" "i" "0"
> [10,] "1" "23" "j" "0"
> [11,] "1" "25" "k" "0"
> [12,] "1" "35" "l" "0"
> [13,] "2" "12" "m" "1"
> [14,] "2" "13" "n" "1"
> [15,] "2" "14" "o" "1"
> [16,] "2" "15" "p" "1"
> [17,] "2" "16" "q" "1"
> [18,] "2" "17" "s" "0"
> [19,] "2" "18" "d" "0"
> [20,] "2" "19" "f" "0"
>
>
> Many thanks in advance,
>
> Niels
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list