[R] how to search to value to another table
Petr Savicky
savicky at praha1.ff.cuni.cz
Mon Jan 31 20:53:40 CET 2011
On Mon, Jan 31, 2011 at 05:35:35PM +0100, Mauluda Akhtar wrote:
> Hello,
>
> I'm a new R user.
>
> I have two different dummy tables with the variable name tb1 and tb2.
Hello.
First, let me put your data into an R command using dput().
tb1 <-
structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("chr1",
"chr2", "chr3"), class = "factor"), V2 = c(22L, 36L, 54L, 77L,
80L, 85L, 99L, 105L, 120L, 130L, 140L, 150L, 172L, 177L, 190L,
200L, 220L, 300L, 310L), V3 = c(23L, 37L, 55L, 78L, 81L, 86L,
100L, 106L, 121L, 131L, 141L, 151L, 173L, 178L, 191L, 201L, 221L,
301L, 311L), V4 = c(3L, 1L, 0L, 1L, 4L, 0L, 1L, 0L, 1L, 1L, 0L,
5L, 0L, 1L, 6L, 8L, 0L, 9L, 10L)), .Names = c("V1", "V2", "V3",
"V4"), class = "data.frame", row.names = c(NA, -19L))
tb2 <-
structure(list(V1 = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("chr1",
"chr2", "chr3"), class = "factor"), V2 = c(20L, 70L, 90L, 130L,
190L), V3 = c(40L, 80L, 110L, 140L, 230L)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))
>
> The first column is the common field of the both tables. In the first table
> column v3 is always v2+1 while in the second table v2 and v3 hold the range.
> I want to know which rows of tb1 intercept with the range between v2 and v3
> of tb3. I tried but I failed to solve the problem.
I am not sure, whether i understand correctly, what you want. If the
intervals in tb2 have different names, then i would expect that the
values in tb1 should be compared to the row in tb2 with the same name.
However, the names in tb2 are not unique. Can you provide an example
of the required output?
Is the following close to what you expect?
tb1$index <- 0
for (i in seq.int(length=nrow(tb2))) {
tb1$index[tb2$V2[i] <= tb1$V2 & tb1$V3 <= tb2$V3[i]] <- i
}
tb1
V1 V2 V3 V4 index
1 chr1 22 23 3 1
2 chr1 36 37 1 1
3 chr1 54 55 0 0
4 chr1 77 78 1 2
5 chr2 80 81 4 0
6 chr2 85 86 0 0
7 chr2 99 100 1 3
8 chr2 105 106 0 3
9 chr2 120 121 1 0
10 chr2 130 131 1 4
11 chr2 140 141 0 0
12 chr2 150 151 5 0
13 chr3 172 173 0 0
14 chr3 177 178 1 0
15 chr3 190 191 6 5
16 chr3 200 201 8 5
17 chr3 220 221 0 5
18 chr3 300 301 9 0
19 chr3 310 311 10 0
tb2
V1 V2 V3
1 chr1 20 40
2 chr1 70 80
3 chr2 90 110
4 chr2 130 140
5 chr3 190 230
The column tb1$index contains for each row the index of the interval [V2, V3]
in tb2, which contains the values V2, V3 from tb1. For example, line
10 chr2 130 131 1 4
of tb1 contains index 4, because the interval
4 chr2 130 140
in tb2 contains numbers 130 and 131.
Index 0 means that no interval in tb2 contains both the numbers V2, V3 in the
given row of tb1.
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list