[R] Table Intersection
Martin Morgan
mtmorgan at fhcrc.org
Wed Jan 18 19:16:51 CET 2012
On 01/18/2012 07:25 AM, rantree wrote:
> I've got two tables....
>
> first one(table1):
>
> ID chrom start end
>
> Ex1 2 152 180
> Ex2 10 2000 2220
> Ex3 15 3000 4000
>
> second one ( table2):
>
> chrom location name
> 2 160 Alv
> 2 190 GNN
> 2 100 ARg
> 10 210 GGG
> 15 3200 ADSA
>
> What I have to do is to put name column in table1 when the location of
> the name is between the start and end ....and chrom must be the same....it
> will be this the result:
>
> ID chrom start end name
> Ex1 2 152 180 Alv
> Ex2 10 2000 2220 GGG
> Ex3 15 3000 4000 ADSA
>
>
> How can i do this ????
Install the Bioconductor package GenomicRanges
source("http://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")
then
library(GenomicRanges)
t1 <- GRanges(c("2", "10", "15"),
IRanges(c(152, 2000, 3000),
c(180, 2220, 4000)),
Id=c("Ex1", "Ex2", "Ex3"))
t2 <- GRanges(c("2", "2", "2", "10", "15"),
IRanges(c(160, 190, 100, 2010, 3200),
width=1),
Name=c("Alv", "GNN", "ARg", "GGG", "ADSA"))
idx <- match(t1, t2)
values(t1)$Name <- values(t2)$Name[idx]
leading to
> t1
GRanges with 3 ranges and 2 elementMetadata values:
seqnames ranges strand | Id Name
<Rle> <IRanges> <Rle> | <character> <character>
[1] 2 [ 152, 180] * | Ex1 Alv
[2] 10 [2000, 2220] * | Ex2 GGG
[3] 15 [3000, 4000] * | Ex3 ADSA
---
seqlengths:
10 15 2
NA NA NA
> as.data.frame(t1)
seqnames start end width strand Id Name
1 2 152 180 29 * Ex1 Alv
2 10 2000 2220 221 * Ex2 GGG
3 15 3000 4000 1001 * Ex3 ADSA
and many other sequence-related operations.
Hope that helps,
Martin
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Table-Intersection-tp4306968p4306968.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the R-help
mailing list