[R] Sort problem in merge()
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Mar 6 23:30:44 CET 2006
One other idea; one could use match instead of merge:
> # tmp1a and tmp2a from below
> cbind(tmp1a, tmp2a[match(tmp1a$col1, tmp2a$col1), -1, drop = FALSE])
col1 col2
1 A NA
2 A NA
3 C 1
4 C 1
5 0 NA
6 0 NA
This avoids having to muck with reordering of rows and reseting of rownames.
Like the prior solution, it assumes that the elements of tmp2a$col1
are unique.
On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Sorry, I mixed up out and outa in the last post. Here it is correctly.
>
> > levs <- c(LETTERS[1:6], "0")
> > tmp1a <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > tmp2a <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> >
> > out <- merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE)
> > out <- out[out$seq, -2]
> > rownames(out) <- rownames(tmp1a)
> > out
> col1 col2
> 1 A NA
> 2 A NA
> 3 C 1
> 4 C 1
> 5 0 NA
> 6 0 NA
>
>
>
> On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > On 3/6/06, Gregor Gorjanc <gregor.gorjanc at gmail.com> wrote:
> > >
> > > But I want to get out
> > >
> > > A NA
> > > A NA
> > > C 1
> > > C 1
> > > 0 NA
> > > 0 NA
> > >
> >
> > That's what I get except for the rownames. Be sure to
> > make the factor levels consistent. I have renamed the data frames
> > tmp1a and tmp2a to distinguish them from the ones in your
> > post and have also reset the rownames to be the original
> > ones, as requested, so that the following is self contained
> > and should be reproducible:
> >
> > > levs <- c(LETTERS[1:6], "0")
> > > tmp1a <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > > tmp2a <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> > >
> > > outa <- merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE)
> > > outa <- outa[out$seq, -2]
> > > rownames(outa) <- rownames(tmp1a)
> > > outa
> > col1 col2
> > 1 0 NA
> > 2 0 NA
> > 3 A NA
> > 4 A NA
> > 5 C 1
> > 6 C 1
> > >
> > > R.version.string # Windows XP
> > [1] "R version 2.2.1, 2005-12-20"
> >
> > By the way, the main limitation with this approach is that the elements of
> > tmp2$col1 be unique so that the result has rows which correspond to those
> > of tmp1; however, that seems to be the case here.
> >
>
More information about the R-help
mailing list