[R] 'Record' row values every time the binary value in a collumn changes
Phil Spector
spector at stat.berkeley.edu
Wed Apr 20 19:44:53 CEST 2011
Here's one way to do part 1:
> rr = rle(Table[,'binary'])
> cc = cumsum(rr$lengths)+1
> thestarts = c(1,cc[cc<=nrow(Table)])
> theends = cc-1
> answer = cbind(Table[thestarts,'Chromosome'],Table[thestarts,'start'],Table[theends,'start'],rr$values)
> answer
[,1] [,2] [,3] [,4]
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
If I understand you correctly, here's a way to do part 2:
> Next = matrix(c(rep(1,12),rep(2,8),c(12,13,14,15,16,18,20,21,22,23,25,35,12,13,14,15,16,17,18,19)),ncol=2)
> apply(Next,1,function(x)answer[answer[,1]==x[1] & x[2] >= answer[,2] & x[2] <= answer[,3],4])
[1] 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
> On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
>> My question is twofold.
>>
>> Part 1:
>> My data looks like this:
>>
>> (example set, real data has 2*10^6 rows)
>> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
>> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
>> start<-c(12,17,18,20,25,36,12,15,16,17,19)
>> Table<-cbind(Chromosome,start,binary)
>> Chromosome start binary
>> [1,] 1 12 1
>> [2,] 1 17 1
>> [3,] 1 18 1
>> [4,] 1 20 0
>> [5,] 1 25 0
>> [6,] 1 36 0
>> [7,] 2 12 1
>> [8,] 2 15 1
>> [9,] 2 16 1
>> [10,] 2 17 0
>> [11,] 2 19 0
>>
>> As output I need a shortlist for each binary block: giving me the starting
>> and ending position of each block.
>> Which for these example would look like this:
>> Chromosome2 position_start position_end binary2
>> [1,] 1 12 18 1
>> [2,] 1 20 36 0
>> [3,] 2 12 16 1
>> [4,] 2 17 19 0
>>
>> Part 2:
>> Based on the output of part 1, I need to assign the binary to rows of
>> another data set. If the position value in this second data set falls in one
>> of the blocks defined in the shortlist made in part1,the binary value of the
>> shortlist should be assigned to an extra column for this row. This would
>> look something like this:
>> Chromosome3 position Value binary3
>> [1,] "1" "12" "a" "1"
>> [2,] "1" "13" "b" "1"
>> [3,] "1" "14" "c" "1"
>> [4,] "1" "15" "d" "1"
>> [5,] "1" "16" "e" "1"
>> [6,] "1" "18" "f" "1"
>> [7,] "1" "20" "g" "0"
>> [8,] "1" "21" "h" "0"
>> [9,] "1" "22" "i" "0"
>> [10,] "1" "23" "j" "0"
>> [11,] "1" "25" "k" "0"
>> [12,] "1" "35" "l" "0"
>> [13,] "2" "12" "m" "1"
>> [14,] "2" "13" "n" "1"
>> [15,] "2" "14" "o" "1"
>> [16,] "2" "15" "p" "1"
>> [17,] "2" "16" "q" "1"
>> [18,] "2" "17" "s" "0"
>> [19,] "2" "18" "d" "0"
>> [20,] "2" "19" "f" "0"
>>
>>
>> Many thanks in advance,
>>
>> Niels
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list