[R] Sort problem in merge()
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Mar 6 18:35:45 CET 2006
Actually we don't need sort = FALSE if we are reordering it anyways:
out <- merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE)
out[out$seq, -2]
On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> I think you will need to reorder it:
>
> out <- merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE)
> out[out$seq, -2]
>
>
>
> On 3/6/06, Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si> wrote:
> > Gabor Grothendieck wrote:
> > > If you make the levels the same does that give what you want:
> > >
> > > levs <- c(LETTERS[1:6], "0")
> > > tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > > tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> > > merge(tmp2, tmp1, all = TRUE, sort = FALSE)
> > > merge(tmp1, tmp2, all = TRUE, sort = FALSE)
> >
> > Gabor thanks for this, but unfortunatelly the result is the same. I get
> > the following via both ways - note that I use all.x or all.y = TRUE.
> >
> > > merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
> > col1 col2
> > 1 C 1
> > 2 C 1
> > 3 A NA
> > 4 A NA
> > 5 0 NA
> > 6 0 NA
> >
> > But I want this order as it is in tmp 1
> >
> > col1
> > 1 A
> > 2 A
> > 3 C
> > 4 C
> > 5 0
> > 6 0
> >
> >
> >
> >
> > >>Hello!
> > >>
> > >>I am merging two datasets and I have encountered a problem with sort.
> > >>Can someone please point me to my error. Here is the example.
> > >>
> > >>## I have dataframes, first one with factor and second one with factor
> > >>## and integer
> > >>
> > >>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
> > >>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
> > >>>tmp1
> > >>
> > >> col1
> > >>1 A
> > >>2 A
> > >>3 C
> > >>4 C
> > >>5 0
> > >>6 0
> > >>
> > >>>tmp2
> > >>
> > >> col1 col2
> > >>1 C 1
> > >>2 D 2
> > >>3 E 3
> > >>4 F 4
> > >>
> > >>## Now merge them
> > >>
> > >>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
> > >>
> > >> all.x = TRUE, sort = FALSE))
> > >> col1 col2
> > >>1 C 1
> > >>2 C 1
> > >>3 A NA
> > >>4 A NA
> > >>5 0 NA
> > >>6 0 NA
> > >>
> > >>## As you can see, sort was applied, since row order is not the same as
> > >>## in tmp1. Reading help page for ?merge did not reveal much about
> > >>## sorting. However I did try to see the result of "non-default" -
> > >>## help page says that order should be the same as in 'y'. So above
> > >>## makes sense
> > >>
> > >>## Now merge - but change x an y
> > >>
> > >>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
> > >>
> > >> all.y = TRUE, sort = FALSE))
> > >> col1 col2
> > >>1 C 1
> > >>2 C 1
> > >>3 A NA
> > >>4 A NA
> > >>5 0 NA
> > >>6 0 NA
> > >>
> > >>## The result is the same. I am stumped here. But looking a bit at these
> > >>## object I found something peculiar
> > >>
> > >>
> > >>>str(tmp1)
> > >>
> > >>`data.frame': 6 obs. of 1 variable:
> > >> $ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
> > >>
> > >>>str(tmp2)
> > >>
> > >>`data.frame': 4 obs. of 2 variables:
> > >> $ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
> > >> $ col2: int 1 2 3 4
> > >>
> > >>>str(tmp12)
> > >>
> > >>`data.frame': 6 obs. of 2 variables:
> > >> $ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
> > >> $ col2: int 1 1 NA NA NA NA
> > >>
> > >>>str(tmp21)
> > >>
> > >>`data.frame': 6 obs. of 2 variables:
> > >> $ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
> > >> $ col2: int 1 1 NA NA NA NA
> > >>
> > >>## Is it OK, that internal presentation of factors vary between
> > >>## different merges. Levels are also different, once only levels
> > >>## from original data.frame are used, while in second example all
> > >>## levels are propagated.
> > >>
> > >>## I have tried the same with characters
> > >>
> > >>>tmp1$col1 <- as.character(tmp1$col1)
> > >>>tmp2$col1 <- as.character(tmp2$col1)
> > >>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
> > >>
> > >> all.x = TRUE, sort = FALSE))
> > >> col1 col2
> > >>1 C 1
> > >>2 C 1
> > >>3 A NA
> > >>4 A NA
> > >>5 0 NA
> > >>6 0 NA
> > >>
> > >>
> > >>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
> > >>
> > >> all.y = TRUE, sort = FALSE))
> > >> col1 col2
> > >>1 C 1
> > >>2 C 1
> > >>3 A NA
> > >>4 A NA
> > >>5 0 NA
> > >>6 0 NA
> > >>
> > >>## The same with characters. Is this a bug. It definitely does not agree
> > >>## with help page, since order is not the same as in 'y'. Can someone
> > >>## please check on newer versions?
> > >>
> > >>## Is there any other way to get the same order as in 'y' i.e. tmp1?
> > >>
> > >>
> > >>>R.version
> > >>
> > >> _
> > >>platform i486-pc-linux-gnu
> > >>arch i486
> > >>os linux-gnu
> > >>system i486, linux-gnu
> > >>status
> > >>major 2
> > >>minor 2.0
> > >>year 2005
> > >>month 10
> > >>day 06
> > >>svn rev 35749
> > >>language R
> > >>
> > >>Thank you very much!
> > >>
> > >>--
> > >>Lep pozdrav / With regards,
> > >> Gregor Gorjanc
> > >>
> > >>----------------------------------------------------------------------
> > >>University of Ljubljana PhD student
> > >>Biotechnical Faculty
> > >>Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
> > >>Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
> > >>
> > >>SI-1230 Domzale tel: +386 (0)1 72 17 861
> > >>Slovenia, Europe fax: +386 (0)1 72 17 888
> > >>
> > >>----------------------------------------------------------------------
> > >>"One must learn by doing the thing; for though you think you know it,
> > >> you have no certainty until you try." Sophocles ~ 450 B.C.
> > >>
> > >>______________________________________________
> > >>R-help at stat.math.ethz.ch mailing list
> > >>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> > >>
> >
> >
> > --
> > Lep pozdrav / With regards,
> > Gregor Gorjanc
> >
> > ----------------------------------------------------------------------
> > University of Ljubljana PhD student
> > Biotechnical Faculty
> > Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
> > Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
> >
> > SI-1230 Domzale tel: +386 (0)1 72 17 861
> > Slovenia, Europe fax: +386 (0)1 72 17 888
> >
> > ----------------------------------------------------------------------
> > "One must learn by doing the thing; for though you think you know it,
> > you have no certainty until you try." Sophocles ~ 450 B.C.
> > ----------------------------------------------------------------------
> >
>
More information about the R-help
mailing list