[R] joining "one-to-many"

Marc Schwartz marc_schwartz at comcast.net
Tue Feb 17 16:07:02 CET 2009


on 02/17/2009 08:33 AM Monica Pisica wrote:
> Hello list,
>  
> I am wondering if a joining "one-to-many" can be done a little bit
easier. I tried merge function but I was not able to do it, so I end up
using for and if.
> 
> Suppose you have a table with locations, each location repeated
several times, and some attributes at that location. The second table
has the same locations, but only once with a different set of
attributes. I would like to add the second set of attributes to the
first table.

> Example:
>  
> set.seed <- 123

This needs to be set.seed(123)

See ?set.seed

:-)

> loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2))
> val1 <- round(rnorm(10),2)
> val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e")
> t1 <- data.frame(loc, val1, val2)
> t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48))
>  
> # join one-to-many
>  
> n <- nrow(t1)
> m <- nrow(t2)
> t1$val3 <- rep(1, n)
> t1$val4 <- rep(1, n)
>  
> for (i in 1:n) {
>     for (j in 1:m){
>     if (t1$loc[i]==t2$loc[j]) {
>         t1$val3[i] <- as.character(t2$val3[j])
>         t1$val4[i] <- t2$val4[j]
>         }
>     }
> }
>  
> Desired result:
> 
> t1
>    loc  val1 val2 val3 val4
> 1   L1 -0.41    a    m   25
> 2   L1 -0.69    b    m   25
> 3   L1  0.36    c    m   25
> 4   L2  1.11    a    n   67
> 5   L2  0.15    b    n   67
> 6   L2 -0.80    d    n   67
> 7   L2 -0.08    f    n   67
> 8   L2 -1.01    e    n   67
> 9   L3 -1.01    b    p   48
> 10  L3 -2.50    e    p   48
> 
>  
> This code works OK but it is slow if the data frames are actually
bigger than my little example. I hope somebody knows of a better way of
doing these type of things.

> merge(t1, t2, by = "loc")
   loc  val1 val2 val3 val4
1   L1 -0.32    a    m   25
2   L1 -1.50    b    m   25
3   L1 -0.31    c    m   25
4   L2  1.42    a    n   67
5   L2  0.32    b    n   67
6   L2 -0.12    d    n   67
7   L2  0.33    f    n   67
8   L2 -1.74    e    n   67
9   L3  0.88    b    p   48
10  L3  1.88    e    p   48

> system.time(merge(t1, t2, by = "loc"))
   user  system elapsed
  0.004   0.000   0.019


HTH,

Marc Schwartz




More information about the R-help mailing list