[R] Sort problem in merge()
Gregor Gorjanc
gregor.gorjanc at bfro.uni-lj.si
Mon Mar 6 15:52:58 CET 2006
Gabor Grothendieck wrote:
> If you make the levels the same does that give what you want:
>
> levs <- c(LETTERS[1:6], "0")
> tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> merge(tmp2, tmp1, all = TRUE, sort = FALSE)
> merge(tmp1, tmp2, all = TRUE, sort = FALSE)
Gabor thanks for this, but unfortunatelly the result is the same. I get
the following via both ways - note that I use all.x or all.y = TRUE.
> merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
col1 col2
1 C 1
2 C 1
3 A NA
4 A NA
5 0 NA
6 0 NA
But I want this order as it is in tmp 1
col1
1 A
2 A
3 C
4 C
5 0
6 0
>>Hello!
>>
>>I am merging two datasets and I have encountered a problem with sort.
>>Can someone please point me to my error. Here is the example.
>>
>>## I have dataframes, first one with factor and second one with factor
>>## and integer
>>
>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
>>>tmp1
>>
>> col1
>>1 A
>>2 A
>>3 C
>>4 C
>>5 0
>>6 0
>>
>>>tmp2
>>
>> col1 col2
>>1 C 1
>>2 D 2
>>3 E 3
>>4 F 4
>>
>>## Now merge them
>>
>>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>
>> all.x = TRUE, sort = FALSE))
>> col1 col2
>>1 C 1
>>2 C 1
>>3 A NA
>>4 A NA
>>5 0 NA
>>6 0 NA
>>
>>## As you can see, sort was applied, since row order is not the same as
>>## in tmp1. Reading help page for ?merge did not reveal much about
>>## sorting. However I did try to see the result of "non-default" -
>>## help page says that order should be the same as in 'y'. So above
>>## makes sense
>>
>>## Now merge - but change x an y
>>
>>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>
>> all.y = TRUE, sort = FALSE))
>> col1 col2
>>1 C 1
>>2 C 1
>>3 A NA
>>4 A NA
>>5 0 NA
>>6 0 NA
>>
>>## The result is the same. I am stumped here. But looking a bit at these
>>## object I found something peculiar
>>
>>
>>>str(tmp1)
>>
>>`data.frame': 6 obs. of 1 variable:
>> $ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
>>
>>>str(tmp2)
>>
>>`data.frame': 4 obs. of 2 variables:
>> $ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
>> $ col2: int 1 2 3 4
>>
>>>str(tmp12)
>>
>>`data.frame': 6 obs. of 2 variables:
>> $ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
>> $ col2: int 1 1 NA NA NA NA
>>
>>>str(tmp21)
>>
>>`data.frame': 6 obs. of 2 variables:
>> $ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
>> $ col2: int 1 1 NA NA NA NA
>>
>>## Is it OK, that internal presentation of factors vary between
>>## different merges. Levels are also different, once only levels
>>## from original data.frame are used, while in second example all
>>## levels are propagated.
>>
>>## I have tried the same with characters
>>
>>>tmp1$col1 <- as.character(tmp1$col1)
>>>tmp2$col1 <- as.character(tmp2$col1)
>>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>
>> all.x = TRUE, sort = FALSE))
>> col1 col2
>>1 C 1
>>2 C 1
>>3 A NA
>>4 A NA
>>5 0 NA
>>6 0 NA
>>
>>
>>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>
>> all.y = TRUE, sort = FALSE))
>> col1 col2
>>1 C 1
>>2 C 1
>>3 A NA
>>4 A NA
>>5 0 NA
>>6 0 NA
>>
>>## The same with characters. Is this a bug. It definitely does not agree
>>## with help page, since order is not the same as in 'y'. Can someone
>>## please check on newer versions?
>>
>>## Is there any other way to get the same order as in 'y' i.e. tmp1?
>>
>>
>>>R.version
>>
>> _
>>platform i486-pc-linux-gnu
>>arch i486
>>os linux-gnu
>>system i486, linux-gnu
>>status
>>major 2
>>minor 2.0
>>year 2005
>>month 10
>>day 06
>>svn rev 35749
>>language R
>>
>>Thank you very much!
>>
>>--
>>Lep pozdrav / With regards,
>> Gregor Gorjanc
>>
>>----------------------------------------------------------------------
>>University of Ljubljana PhD student
>>Biotechnical Faculty
>>Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
>>
>>SI-1230 Domzale tel: +386 (0)1 72 17 861
>>Slovenia, Europe fax: +386 (0)1 72 17 888
>>
>>----------------------------------------------------------------------
>>"One must learn by doing the thing; for though you think you know it,
>> you have no certainty until you try." Sophocles ~ 450 B.C.
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
--
Lep pozdrav / With regards,
Gregor Gorjanc
----------------------------------------------------------------------
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europe fax: +386 (0)1 72 17 888
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
you have no certainty until you try." Sophocles ~ 450 B.C.
More information about the R-help
mailing list