[R] Form groups of lines and select specific values
arun
smartpink111 at yahoo.com
Wed Feb 12 06:37:27 CET 2014
Hi,
Not sure I understand it correctly.
May be this helps:
dat <- read.table(text="type1 chrx startx endx chry starty endy type2
gain_765 chr15 9681969 9685418 chr15 9660912 9712719 loss_1136
gain_766 chr15 9706682 9852347 chr15 9660912 9712719 loss_1136
gain_766 chr15 9706682 9852347 chr15 9765125 9863990 loss_765
gain_780 chr20 9706682 9852347 ch20 9765125 9863990 loss_769
gain_760 chr15 9706682 9852347 chr15 9660912 9712719 loss_1137
gain_760 chr15 9706682 9852347 chr15 9765125 9863990 loss_763",sep="",header=TRUE,stringsAsFactors=FALSE)
indx <- rle(dat$chry)$lengths
indx1 <- cumsum(indx)
indx2 <- indx1-(indx-1)
chr <- dat$chrx[indx1]
start <- do.call(pmin,data.frame(startx=dat$startx[indx2],starty=dat$starty[indx2]))
end <- do.call(pmax,data.frame(endx=dat$endx[indx1],endy=dat$endy[indx1]))
dat2 <- data.frame(chr,start,end,stringsAsFactors=FALSE)
dat2
# chr start end
#1 chr15 9660912 9863990
#2 chr20 9706682 9863990
#3 chr15 9660912 9863990
A.K.
I would like to form group of lines based in interconection (two ways)
between "type1" collumn and "type2" collumn. The logic is: if a string
in "type1" are in the same line of "type2" collumn they are in the same
group. However if "type2" are more than one line all those are in the
same group.
Please take a look in the first 3 lines: "gain_765" and
"loss_1136" are related. However, "loss_1136" are related with
"gain_766" and subsenquently "gain_766" are relate with "loss_765". Then
these is my group: 1- "gain_765", 2- "loss_1136", 3-"gain_766",
4-"loss_765".
Inside this group I wanna to make a new line with string in
"chrx" on first line of the group; lowest value in "startx" and
"starty"; larger value in "endx" and "endy". Follow a example of my
data:
type1 chrx startx endx chry starty endy type2
gain_765 chr15 9681969 9685418 chr15 9660912 9712719 loss_1136
gain_766 chr15 9706682 9852347 chr15 9660912 9712719 loss_1136
gain_766 chr15 9706682 9852347 chr15 9765125 9863990 loss_765
gain_780 chr20 9706682 9852347 ch20 9765125 9863990 loss_769
gain_760 chr15 9706682 9852347 chr15 9660912 9712719 loss_1137
gain_760 chr15 9706682 9852347 chr15 9765125 9863990 loss_763
To first group (line 1 to 3) this is the expected output:
chr start end
chr15 9660912 9863990
Now, please take a look in line 4: "gain_780" is related just
with "loss_769". Is this group (just line 4) the output expected
follows:
chr start end
chr20 9706682 9863990
Now, lines 5 and 6 the group is formed by "gain_760"; "loss_1137" and "loss_763". In this last case the expected output is:
chr start end
chr15 9660912 9863990
But, I have many of this cases in thousands of lines. Therefore, I need all results in a unique output, like that:
chr start end
chr15 9660912 9863990
chr20 9706682 9863990
chr15 9660912 9863990
Cheers.
More information about the R-help
mailing list