[R] Sort problem in merge()
Jean Eid
jeaneid at chass.utoronto.ca
Mon Mar 6 17:15:38 CET 2006
If believe that merge is trying to put first whatever cells that are
nonempty. For example if you instead did
> tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F","A"), levs),
col2 = 1:5)
> tmp2
col1 col2
1 C 1
2 D 2
3 E 3
4 F 4
5 A 5
> merge(tmp2, tmp1, all.y = TRUE, sort = FALSE)
col1 col2
1 A 5
2 A 5
3 C 1
4 C 1
5 0 NA
6 0 NA
> tmp1
col1
1 A
2 A
3 C
4 C
5 0
6 0
>
and if you do this
> tmp1 <- data.frame(col1 = factor(c("0", "0", "C", "C", "A", "A"), levs))
> merge(tmp2, tmp1, all.y = TRUE, sort = FALSE)
col1 col2
1 C 1
2 C 1
3 A 5
4 A 5
5 0 NA
6 0 NA
>
So I think it is doing what you want it to do.
Jean
Gregor Gorjanc wrote:
>Gabor Grothendieck wrote:
>
>
>>If you make the levels the same does that give what you want:
>>
>>levs <- c(LETTERS[1:6], "0")
>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
>>merge(tmp2, tmp1, all = TRUE, sort = FALSE)
>>merge(tmp1, tmp2, all = TRUE, sort = FALSE)
>>
>>
>
>Gabor thanks for this, but unfortunatelly the result is the same. I get
>the following via both ways - note that I use all.x or all.y = TRUE.
>
>
>
>>merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
>>
>>
> col1 col2
>1 C 1
>2 C 1
>3 A NA
>4 A NA
>5 0 NA
>6 0 NA
>
>But I want this order as it is in tmp 1
>
> col1
>1 A
>2 A
>3 C
>4 C
>5 0
>6 0
>
>
>
>
>
>
>>>Hello!
>>>
>>>I am merging two datasets and I have encountered a problem with sort.
>>>Can someone please point me to my error. Here is the example.
>>>
>>>## I have dataframes, first one with factor and second one with factor
>>>## and integer
>>>
>>>
>>>
>>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
>>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
>>>>tmp1
>>>>
>>>>
>>>col1
>>>1 A
>>>2 A
>>>3 C
>>>4 C
>>>5 0
>>>6 0
>>>
>>>
>>>
>>>>tmp2
>>>>
>>>>
>>>col1 col2
>>>1 C 1
>>>2 D 2
>>>3 E 3
>>>4 F 4
>>>
>>>## Now merge them
>>>
>>>
>>>
>>>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>
>>>>
>>> all.x = TRUE, sort = FALSE))
>>>col1 col2
>>>1 C 1
>>>2 C 1
>>>3 A NA
>>>4 A NA
>>>5 0 NA
>>>6 0 NA
>>>
>>>## As you can see, sort was applied, since row order is not the same as
>>>## in tmp1. Reading help page for ?merge did not reveal much about
>>>## sorting. However I did try to see the result of "non-default" -
>>>## help page says that order should be the same as in 'y'. So above
>>>## makes sense
>>>
>>>## Now merge - but change x an y
>>>
>>>
>>>
>>>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>
>>>>
>>> all.y = TRUE, sort = FALSE))
>>>col1 col2
>>>1 C 1
>>>2 C 1
>>>3 A NA
>>>4 A NA
>>>5 0 NA
>>>6 0 NA
>>>
>>>## The result is the same. I am stumped here. But looking a bit at these
>>>## object I found something peculiar
>>>
>>>
>>>
>>>
>>>>str(tmp1)
>>>>
>>>>
>>>`data.frame': 6 obs. of 1 variable:
>>>$ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
>>>
>>>
>>>
>>>>str(tmp2)
>>>>
>>>>
>>>`data.frame': 4 obs. of 2 variables:
>>>$ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
>>>$ col2: int 1 2 3 4
>>>
>>>
>>>
>>>>str(tmp12)
>>>>
>>>>
>>>`data.frame': 6 obs. of 2 variables:
>>>$ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
>>>$ col2: int 1 1 NA NA NA NA
>>>
>>>
>>>
>>>>str(tmp21)
>>>>
>>>>
>>>`data.frame': 6 obs. of 2 variables:
>>>$ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
>>>$ col2: int 1 1 NA NA NA NA
>>>
>>>## Is it OK, that internal presentation of factors vary between
>>>## different merges. Levels are also different, once only levels
>>>## from original data.frame are used, while in second example all
>>>## levels are propagated.
>>>
>>>## I have tried the same with characters
>>>
>>>
>>>
>>>>tmp1$col1 <- as.character(tmp1$col1)
>>>>tmp2$col1 <- as.character(tmp2$col1)
>>>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>
>>>>
>>> all.x = TRUE, sort = FALSE))
>>>col1 col2
>>>1 C 1
>>>2 C 1
>>>3 A NA
>>>4 A NA
>>>5 0 NA
>>>6 0 NA
>>>
>>>
>>>
>>>
>>>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>
>>>>
>>> all.y = TRUE, sort = FALSE))
>>>col1 col2
>>>1 C 1
>>>2 C 1
>>>3 A NA
>>>4 A NA
>>>5 0 NA
>>>6 0 NA
>>>
>>>## The same with characters. Is this a bug. It definitely does not agree
>>>## with help page, since order is not the same as in 'y'. Can someone
>>>## please check on newer versions?
>>>
>>>## Is there any other way to get the same order as in 'y' i.e. tmp1?
>>>
>>>
>>>
>>>
>>>>R.version
>>>>
>>>>
>>> _
>>>platform i486-pc-linux-gnu
>>>arch i486
>>>os linux-gnu
>>>system i486, linux-gnu
>>>status
>>>major 2
>>>minor 2.0
>>>year 2005
>>>month 10
>>>day 06
>>>svn rev 35749
>>>language R
>>>
>>>Thank you very much!
>>>
>>>--
>>>Lep pozdrav / With regards,
>>> Gregor Gorjanc
>>>
>>>----------------------------------------------------------------------
>>>University of Ljubljana PhD student
>>>Biotechnical Faculty
>>>Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>>Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si
>>>
>>>SI-1230 Domzale tel: +386 (0)1 72 17 861
>>>Slovenia, Europe fax: +386 (0)1 72 17 888
>>>
>>>----------------------------------------------------------------------
>>>"One must learn by doing the thing; for though you think you know it,
>>>you have no certainty until you try." Sophocles ~ 450 B.C.
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>>
>>>
>>>
>
>
>
>
More information about the R-help
mailing list