[R] extract data from a data frame field
jim holtman
jholtman at gmail.com
Tue Jun 7 04:58:51 CEST 2011
Here is a start; you can change the column names:
> x
chr start end peak_loc cluster_TC strand peak_TC
1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
4 chr1 565653 565697 chr1:565662..565663,+ 1031 + 28
5 chr1 565861 565922 chr1:565883..565884,+ 316 + 12
6 chr1 566537 566573 chr1:566564..566565,+ 119 + 11
> y <- sub("^.*:([[:digit:]]+)..([[:digit:]]+).*", "\\1 \\2", x$peak_loc)
> y
[1] "564644 564645" "565371 565372" "565480 565481" "565662 565663"
"565883 565884" "566564 566565"
> y <- strsplit(y, ' ')
> y
[[1]]
[1] "564644" "564645"
[[2]]
[1] "565371" "565372"
[[3]]
[1] "565480" "565481"
[[4]]
[1] "565662" "565663"
[[5]]
[1] "565883" "565884"
[[6]]
[1] "566564" "566565"
> x.new <- cbind(x, do.call(rbind, y))
> x.new
chr start end peak_loc cluster_TC strand peak_TC
1 2
1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
564644 564645
2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
565371 565372
3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
565480 565481
4 chr1 565653 565697 chr1:565662..565663,+ 1031 + 28
565662 565663
5 chr1 565861 565922 chr1:565883..565884,+ 316 + 12
565883 565884
6 chr1 566537 566573 chr1:566564..566565,+ 119 + 11
566564 566565
On Mon, Jun 6, 2011 at 8:22 PM, ads pit <deconstructed.morning at gmail.com> wrote:
> Hi all,
> I am given the a data frame in which one of the columns has more information
> together- see column 4, peak_loc:
> chr start end peak_loc cluster_TC strand peak_TC
> 1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
> 2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
> 3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
> 4 chr1 565653 565697 chr1:565662..565663,+ 1031 + 28
> 5 chr1 565861 565922 chr1:565883..565884,+ 316 + 12
> 6 chr1 566537 566573 chr1:566564..566565,+ 119 + 11
>
>
> I am trying to find out if there's a way to extract the coordinates given
> in the 4th column and replace this column with two others that would have
> the start coord and the end coord. so instead of chr1:564644..564645,+
> I would obtain;
> start_peak end_peak
> 564644 564645
>
> Best,
> nanami
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list