[R] merge, cbind, or....?

Fri Jul 23 17:36:13 CEST 2004

On Fri, 2004-07-23 at 10:07, Bruno Cutayar wrote:
> Hi,
> i have two data.frame x and y like :
>  > x <- data.frame( num = c(1:10), value = runif(10) )
>  > y <- data.frame( num = c(6:10), value = runif(5) )
> and i want to obtain something like :
> 
>  num.x    value.x     num.y   value.y
>       1 0.38423828    NA 0.2911089
>       2 0.17402507    NA 0.8455208
>       3 0.54443465    NA 0.8782199
>       4 0.04540406    NA 0.3202252
>       5 0.46052426    NA 0.7560559
>       6 0.61385464     6 0.2911089
>       7 0.48274968     7 0.8455208
>       8 0.11961778     8 0.8782199
>       9 0.64531394     9 0.3202252
>     10 0.92052805    10 0.7560559
> 
> with NA in case of missing value for y to x.
> 
> { for this example : i write simply
>  > data.frame(num.x=c(1:10), 
> value.x=x[[2]],num.y=c(rep(NA,5),6:10),value.y=y[[2]]) }
> 
> I didn't find solution in merge(x,y,by="num") : missing rows are no keeping.
> Can't you help me ?
> 
> thanks,
> Bruno

The use of merge(), with the argument 'all' set to TRUE, will get you
the following (note my values are different due to not using the same
'seed' value for runif() ):

> merge(x, y, by = "num", all = TRUE)
   num    value.x   value.y
1    1 0.14057955        NA
2    2 0.60850644        NA
3    3 0.63410731        NA
4    4 0.07196253        NA
5    5 0.51869503        NA
6    6 0.57042428 0.3340535
7    7 0.85874426 0.9340489
8    8 0.03608417 0.5417780
9    9 0.24422205 0.2214993
10  10 0.03383263 0.4947865

The use of 'all = TRUE' will fill in non-matching rows. The default is
FALSE.

Note here however, that the value.y column is not replicated for the
first five rows, as you have above. If that is what you want, you could
do something like the following:

> cbind(x, y$value)
   num      value   y$value
1    1 0.14057955 0.3340535
2    2 0.60850644 0.9340489
3    3 0.63410731 0.5417780
4    4 0.07196253 0.2214993
5    5 0.51869503 0.4947865
6    6 0.57042428 0.3340535
7    7 0.85874426 0.9340489
8    8 0.03608417 0.5417780
9    9 0.24422205 0.2214993
10  10 0.03383263 0.4947865

which takes advantage of the recycling of y$value, since it is shorter
than the number of rows in 'x'. In this case, y$value is repeated twice.
HTH,

Marc Schwartz