[R] merge, cbind, or....?
Marc Schwartz
MSchwartz at MedAnalytics.com
Fri Jul 23 17:36:13 CEST 2004
On Fri, 2004-07-23 at 10:07, Bruno Cutayar wrote:
> Hi,
> i have two data.frame x and y like :
> > x <- data.frame( num = c(1:10), value = runif(10) )
> > y <- data.frame( num = c(6:10), value = runif(5) )
> and i want to obtain something like :
>
> num.x value.x num.y value.y
> 1 0.38423828 NA 0.2911089
> 2 0.17402507 NA 0.8455208
> 3 0.54443465 NA 0.8782199
> 4 0.04540406 NA 0.3202252
> 5 0.46052426 NA 0.7560559
> 6 0.61385464 6 0.2911089
> 7 0.48274968 7 0.8455208
> 8 0.11961778 8 0.8782199
> 9 0.64531394 9 0.3202252
> 10 0.92052805 10 0.7560559
>
> with NA in case of missing value for y to x.
>
> { for this example : i write simply
> > data.frame(num.x=c(1:10),
> value.x=x[[2]],num.y=c(rep(NA,5),6:10),value.y=y[[2]]) }
>
> I didn't find solution in merge(x,y,by="num") : missing rows are no keeping.
> Can't you help me ?
>
> thanks,
> Bruno
The use of merge(), with the argument 'all' set to TRUE, will get you
the following (note my values are different due to not using the same
'seed' value for runif() ):
> merge(x, y, by = "num", all = TRUE)
num value.x value.y
1 1 0.14057955 NA
2 2 0.60850644 NA
3 3 0.63410731 NA
4 4 0.07196253 NA
5 5 0.51869503 NA
6 6 0.57042428 0.3340535
7 7 0.85874426 0.9340489
8 8 0.03608417 0.5417780
9 9 0.24422205 0.2214993
10 10 0.03383263 0.4947865
The use of 'all = TRUE' will fill in non-matching rows. The default is
FALSE.
Note here however, that the value.y column is not replicated for the
first five rows, as you have above. If that is what you want, you could
do something like the following:
> cbind(x, y$value)
num value y$value
1 1 0.14057955 0.3340535
2 2 0.60850644 0.9340489
3 3 0.63410731 0.5417780
4 4 0.07196253 0.2214993
5 5 0.51869503 0.4947865
6 6 0.57042428 0.3340535
7 7 0.85874426 0.9340489
8 8 0.03608417 0.5417780
9 9 0.24422205 0.2214993
10 10 0.03383263 0.4947865
which takes advantage of the recycling of y$value, since it is shorter
than the number of rows in 'x'. In this case, y$value is repeated twice.
HTH,
Marc Schwartz
More information about the R-help
mailing list